Introducing Thoth: A Manuscript Evaluation Tool

November 6, 2020November 13, 2020 ~ Kevin Indrebo ~ Leave a comment

I’ve been tinkering around with code to analyze my manuscripts for a few years, and I finally got serious enough about it to build a real-life application. I named the app Thoth after the Egyptian god of writing and magic (among other things).

After writing several bad books and getting helpful (though sometimes painful) feedback, I realized I had a number of tendencies that showed up as weak writing. I also figured since I’m a heavily experienced programmer, I could make my own revision process easier by setting up some logic to analyze my manuscripts and identify those weaknesses with fancy charts and whatnot.

I believe other writers could benefit from my code, and so I’ve released the initial beta version of the application here. It’s free to download and use!

Admittedly, it’s far from perfect. With my writing, I’m typically very reluctant to let anyone else see it until I’ve spent months revising it. This is essentially a first draft, and as we all know, all software has bugs. Mine is no exception. The format of the PDF report generated could be cleaner, and the text I coded in could be better written. Also, I wish the download process was a little faster and easier. (It’s not really that bad, I promise!)

I plan to work on improving on these weaknesses, as well as adding more features in the future. But cut me a break here, please – you have no idea how much fucking time I spent on Stackoverflow trying to figure out why matplotlib was crashing the app and why pyinstaller and plotly don’t play so nice together. ON MY BIRTHDAY NO LESS.

Even so, I think other writers should give it a try. Oh, hey, did I mention it’s TOTALLY AND COMPLETELY FREE.

Here are the reports generated within the PDF file:

Dialogue %
# of dialogue beats
Sentence fragments
Repetitive cadences
Unusual narrative punctuation
Adjectives/adverbs
‘Crutch’ words (that, just, etc.)
Filter words

Please give it a try 😁

Population Immunity vs COVID-19 Spread Rate, Cont’d

October 7, 2020March 22, 2021 ~ Kevin Indrebo ~ 1 Comment

In my previous post I demonstrated a strong negative correlation between cumulative COVID cases and the R_t (current rate of reproduction of the virus) on a countywide basis in the US. I mentioned, though, that my quick and dirty data analysis was incomplete – a univariate analysis can be misleading if there are confounding factors. In this post, I expand the data to a multivariate model to examine possibly correlated factors.

For those who think the hypothesis I expressed in my previous posts (that population immunity is the primary factor determining R_t) is wrong, there are two likely counterarguments:

People in regions more strongly impacted by COVID take it more seriously, leading to more social distancing and mask wearing.
Mask mandates have generally been introduced in places with high case loads, and its those mandates that are mostly responsible for the reduction in spread.

The multivariate model I present here includes 3 new factors:

Current Cases /Capita
Social Mobility (from Apple)
Statewide mask mandates (dummy variable)

The base dataset and date ranges remain the same as in my previous post. Each datapoint corresponds to one major county in America every 4th Tuesday. Here are the results from an OLS:

Multiple linear regression results predicting Rt by county. High Cumulative case /capita and mask mandates both show strong statistical significance in lowering Rt. Current cases and social mobility fail to pass the significance test (p=0.197,0.225).

The R² value isn’t terribly high, so we need to be careful about making strong conclusions (lower R² indicates a lot is left unexplained). But the results do suggest some meaningful takeaways:

My view remains unchallenged. Even after accounting for social mobility and mask wearing (mandates), the cumulative case rate (which is associated with population-level immunity) is by far the best predictor of R_t.
Surprisingly, the coefficient for mobility is negative – which would imply that higher mobility leads to reduced virus spread if the coefficient was statistically significant (p<0.05). It is not. This could suggest that Stay at Home/Shelter in Place orders may have been worthless for containing the spread in America, though I suspect it might instead mean that the mobility data doesn’t accurately measure what’s meaningful in spreading the virus. I think weather may play a major role here – if people are going places, but staying outside during the summer, that would be effectively quite different from traveling places in the winter, when gathering must occur indoors in much of the nation.
Mask mandates do appear to contribute to reducing the spread, though they don’t guarantee anything.
Current Case /Capita does not seem to matter (another indication that people in current hot spots don’t adjust their behavior, thereby causing a reduction in R_t).

Some notes and caveats:

I normalized all input variables to have zero mean and stdev=1
I smoothed the Apple mobility indexes with a 14-day moving average, then computed a single score by averaging the walking/driving/transit scores. This score may not give the best predictor of R_t, but I wanted to keep it as simple as possible.
Ideally, the mobility data would’ve been measured year-over-year, instead of indexed in January. But this is the data I have.
I should’ve smoothed the Current Cases /Capita with a moving average, but I got lazy.
The statistical significance numbers are likely to be modestly overstated, as the data is likely not all 100% independent (bordering counties, repeated points from same location 4 weeks apart).
I got most of the dates for the beginning of statewide mask mandates here, though I had to Google 2 or 3 missing dates. Some counties/cities had mask mandates before their states implemented them, but I’m not sure I’m willing to put in the time/effort to collect that data.

Population Immunity vs COVID-19 Spread Rate

September 17, 2020October 22, 2020 ~ Kevin Indrebo ~ 2 Comments

In my last post, I mentioned the idea that population immunity, or the total % of the population that has been infected, is a major determinant of COVID-19 spread. I displayed a chart of daily new cases in NYC and compared it to social mobility data, showing an apparent negative correlation between mobility and cases. My assertion is that population-level immunity is more important than many other factors in determining how fast the virus spreads. I’d like to add a little more support for that view here.

Another piece of anecdotal evidence comes from my second home, Los Angeles County:

I don't think it makes sense to compare New York State (141,300 sq km, 19.5m population) with Madrid (8,000 sq km, 6.7m population).

As a better comparison, LA County has largely followed expert advice since March, but they still experienced a significant surge. pic.twitter.com/pROioXa1fK
— Youyang Gu (@youyanggu) September 11, 2020

From the above chart, you can see that the daily new case count peaked in mid July, even though lockdowns were enforced beginning in March, and a mask mandate has been in place since May. Yet from mid July, new cases have been steadily plummeting, even with little or no decrease in mobility since that time:

Now, anecdotal evidence is all well and good, but I much prefer statistical evidence when available, so I pulled some county-level data from a COVID tracking website, with estimates for the R_t value by U.S. county for each date during the crisis.

*I’ll note before giving the results that a more complete analysis than I’ve done would incorporate multiple variables (e.g. mask usage, mobility) to ensure I’m not picking up on secondary effects from correlated variables. Perhaps I’ll look at doing that in the future, but that requires substantially more work.*

(Update: My follow-up post looks at a more complete model).

I filtered the data to select only counties with a population of at least 250k, which gave me a total of 273 counties. I looked at the (smoothed) R_t values for every Tuesday during the crises, comparing them to the % of the population that had tested positive for COVID by that date. Here’s a scatter plot:

The correlation between these 2 variables is -0.52. Of course, there are many other factors that determine R_t, some of which are mostly random, but population infection rate (immunity) is clearly a large factor. Note that everyone agrees the total number of infected is much greater than the number of cases, though the ratio varies by region. With a 10x multiplier (typical for the U.S., I think) a 2% case rate implies 20% total infected.

Here’s a box plot comparing R_t for all instances above/below a threshold of 2% total case rate:

A statistical comparison of the 2 datasets gives:

The significance stats are somewhat overstated, as successive Tuesday’s numbers for each county will not be truly independent. But I’ve tried running these analyses by “undersampling” the dates (e.g. only using 1 Tuesday per month, or even less), and I still saw strong significance in all tests.

As I mentioned in the previous post regarding NYC, these high case rates don’t indicate real herd immunity. Instead, I suggest we stop thinking about herd immunity as a binary concept, and realize that for places with low population immunity, suppressing the spread is incredibly difficult, regardless of social distancing, masks, etc.

A Closer Look at 3-pt Defense

March 15, 2017June 29, 2017 ~ Kevin Indrebo ~ Leave a comment

markus_shot

Photo by Brian Georgeson

As the opening weekend of the big dance approaches, and Marquette is finally back in where it belongs, I wanted to dig into a stat that will be very relevant to us in Greenville, SC: 3-point defense.

Our first round matchup is against South Carolina, a team that sports one of the highest ranked 3-point defenses in the nation. I’ve seen some fans on #mubb twitter question whether this ranking is deserved, based in part on the relatively poor 3-point offenses in the SEC. I’ve been playing around with the 2017 NCAA Div 1 Men’s Basketball game data provided by Kaggle this week, so I thought I’d write some code to do a brief analysis on this subject.

My method is as follows: For each game a team played, compute:

Opponent’s 3-pt % for that game
Opponent’s season 3-pt % after removing games played against team of record

I can then compute basic stats on the differences between those paired quantities, and run a hypothesis test to determine how likely it is that any difference is due to chance.

In the tables farther down, I show the basic stats for the top 25 teams using 3 different rankings:

Team’s defensive 3-pt %
Average difference between opponent’s 3-pt % against team of record and rest of league
T-stat of that difference

The columns in each table are:

Team
Games played
Defensive 3-pt %
Average of difference for opponent’s 3-pt % (positive = better than average)
Standard deviation of difference for opponent’s 3-pt %
T-stat on paired differences
Bootstrap-resampled p-value on paired differences*

*Bootstrap resampled p-value is a randomized estimate of the p-value. This method is valuable when the sample size is fairly low. P-value refers to the probability that you might encounter a difference as large as the data, if the team’s defense was simply average. In this case, I’m computing a one-tailed test to determine if the team of record is capable of allowing a lower than average 3-pt %. Lower p-values indicate a higher level of confidence that the team is better than average at 3-pt defense.

One other note: as I said, the data comes from Kaggle, and though I assume it’s clean, I can’t be sure there are no errors. I see slight discrepancies between my 3-pt defensive numbers and kenpom.com, so the data may not be perfect.

A couple observations: South Carolina’s 3-pt defense does appear to be better than average, but probably not quite as good as advertised. They are ranked 6th in defensive %, 13th in opponent’s difference %, and 42nd in statistical significance. The lower ranking in significance comes from a higher variance in defensive % from game-to-game, and suggests some of their outperformance may be due to luck. But look who’s highly ranked in all 3 categories (#1 in difference): Duke, a potential second round matchup for Marquette.

(Also, we suck at this aspect of the game. No surprise there.)

Here are the ranked data:

Table I – Ranked by lowest 3-pt defensive %

Rank	Team	Games	OppAvg	Diff	Stdev	T-stat	P-value
1	Morgan St	29	28.0%	5.4%	10.3%	2.822	0.0009
2	Rhode Island	33	29.1%	5.8%	10.9%	3.053	0.0005
3	NC Central	30	29.2%	3.9%	12.0%	1.770	0.0382
4	New Mexico St	30	29.8%	4.2%	11.7%	1.975	0.0224
5	Arizona	34	29.9%	5.9%	9.0%	3.808	0.0002
6	South Carolina	31	29.9%	4.6%	15.8%	1.627	0.0496
7	Duke	35	29.9%	6.4%	12.0%	3.140	0.0008
8	Gonzaga	33	30.0%	6.0%	11.1%	3.122	0.0008
9	Alcorn St	29	30.1%	2.8%	11.2%	1.326	0.0933
10	Wichita St	33	30.1%	5.5%	11.6%	2.719	0.0028
11	Minnesota	33	30.3%	5.5%	11.9%	2.658	0.0050
12	Robert Morris	33	30.4%	3.9%	11.7%	1.928	0.0241
13	Nevada	34	30.5%	4.4%	10.6%	2.428	0.0072
14	Louisville	32	30.6%	6.2%	11.1%	3.144	0.0011
15	St Mary’s CA	32	30.7%	5.1%	9.4%	3.084	0.0011
16	Col Charleston	33	30.7%	3.8%	9.0%	2.444	0.0048
17	Florida	32	30.8%	4.1%	10.0%	2.319	0.0065
18	New Orleans	28	30.8%	4.2%	8.6%	2.591	0.0040
19	FL Gulf Coast	30	31.0%	4.9%	9.0%	2.999	0.0011
20	Villanova	34	31.1%	5.5%	7.8%	4.153	0.0000
21	Colorado St	32	31.1%	3.9%	11.4%	1.918	0.0327
22	Seattle	27	31.1%	2.9%	10.8%	1.395	0.0771
23	Illinois St	32	31.1%	4.7%	7.9%	3.368	0.0003
24	Winthrop	30	31.2%	4.6%	10.8%	2.323	0.0092
25	Furman	30	31.5%	4.0%	9.2%	2.377	0.0070

284	Marquette	31	37.0%	-1.9%	11.9%	-0.885	0.8154

Table II – Ranked by biggest opponent’s difference

Rank	Team	Games	OppAvg	Diff	Stdev	T-stat	P-value
1	Duke	35	29.9%	6.4%	12.0%	3.140	0.0008
2	Louisville	32	30.6%	6.2%	11.1%	3.144	0.0011
3	Gonzaga	33	30.0%	6.0%	11.1%	3.122	0.0008
4	Arizona	34	29.9%	5.9%	9.0%	3.808	0.0002
5	Rhode Island	33	29.1%	5.8%	10.9%	3.053	0.0005
6	Villanova	34	31.1%	5.5%	7.8%	4.153	0.0000
7	Minnesota	33	30.3%	5.5%	11.9%	2.658	0.0050
8	Wichita St	33	30.1%	5.5%	11.6%	2.719	0.0028
9	Morgan St	29	28.0%	5.4%	10.3%	2.822	0.0009
10	St Mary’s CA	32	30.7%	5.1%	9.4%	3.084	0.0011
11	FL Gulf Coast	30	31.0%	4.9%	9.0%	2.999	0.0011
12	Illinois St	32	31.1%	4.7%	7.9%	3.368	0.0003
13	South Carolina	31	29.9%	4.6%	15.8%	1.627	0.0496
14	Virginia	32	31.6%	4.6%	12.2%	2.146	0.0169
15	Winthrop	30	31.2%	4.6%	10.8%	2.323	0.0092
16	Nevada	34	30.5%	4.4%	10.6%	2.428	0.0072
17	New Mexico St	30	29.8%	4.2%	11.7%	1.975	0.0224
18	New Orleans	28	30.8%	4.2%	8.6%	2.591	0.0040
19	BYU	33	32.2%	4.1%	11.2%	2.111	0.0182
20	Florida	32	30.8%	4.1%	10.0%	2.319	0.0065
21	Baylor	31	31.6%	4.0%	9.9%	2.251	0.0123
22	Furman	30	31.5%	4.0%	9.2%	2.377	0.0070
23	Robert Morris	33	30.4%	3.9%	11.7%	1.928	0.0241
24	NC Central	30	29.2%	3.9%	12.0%	1.770	0.0382
25	Colorado St	32	31.1%	3.9%	11.4%	1.918	0.0327

267	Marquette	31	37.0%	-1.9%	11.9%	-0.885	0.8154

Table III – Ranked by strongest statistical significance

Rank	Team	Games	OppAvg	Diff	Stdev	T-stat	P-value
1	Villanova	34	31.1%	5.5%	7.8%	4.153	0.0000
2	Arizona	34	29.9%	5.9%	9.0%	3.808	0.0002
3	Illinois St	32	31.1%	4.7%	7.9%	3.368	0.0003
4	Louisville	32	30.6%	6.2%	11.1%	3.144	0.0011
5	Duke	35	29.9%	6.4%	12.0%	3.140	0.0008
6	Gonzaga	33	30.0%	6.0%	11.1%	3.122	0.0008
7	St Mary’s CA	32	30.7%	5.1%	9.4%	3.084	0.0011
8	Rhode Island	33	29.1%	5.8%	10.9%	3.053	0.0005
9	FL Gulf Coast	30	31.0%	4.9%	9.0%	2.999	0.0011
10	Morgan St	29	28.0%	5.4%	10.3%	2.822	0.0009
11	Wichita St	33	30.1%	5.5%	11.6%	2.719	0.0028
12	Minnesota	33	30.3%	5.5%	11.9%	2.658	0.0050
13	New Orleans	28	30.8%	4.2%	8.6%	2.591	0.0040
14	Col Charleston	33	30.7%	3.8%	9.0%	2.444	0.0048
15	Nevada	34	30.5%	4.4%	10.6%	2.428	0.0072
16	Furman	30	31.5%	4.0%	9.2%	2.377	0.0070
17	Winthrop	30	31.2%	4.6%	10.8%	2.323	0.0092
18	Florida	32	30.8%	4.1%	10.0%	2.319	0.0065
19	Wyoming	30	32.1%	3.0%	7.0%	2.310	0.0096
20	Baylor	31	31.6%	4.0%	9.9%	2.251	0.0123
21	St Peter’s	32	31.9%	3.2%	8.2%	2.221	0.0119
22	Virginia	32	31.6%	4.6%	12.2%	2.146	0.0169
23	BYU	33	32.2%	4.1%	11.2%	2.111	0.0182
24	Oregon	33	31.9%	3.7%	10.2%	2.067	0.0209
25	Texas	33	32.6%	3.8%	10.5%	2.052	0.0206

42	South Carolina	31	29.9%	4.6%	15.8%	1.627	0.0496

266	Marquette	31	37.0%	-1.9%	11.9%	-0.885	0.8154

My 2016 NCAA Bracket

March 16, 2016 ~ Kevin Indrebo ~ Leave a comment

In my previous post, I introduced the quantitative rating system I’m using this year as a guide for filling out my bracket. I didn’t go strictly by the book – I went with the lower rated team on a small number of occasions. I did this mostly because the system’s picks were too boring (not enough upsets). Without further ado…

bracket2016

My first discretionary call was taking Yale over Baylor. I couldn’t NOT pick a 12-5 upset, and according to my system, this is the most probable one.

I then took Cincinnati over Oregon in the second round, even though Oregon is rated higher. It seems like this year has the potential to be really bloody, and I doubt all the 1 seeds will make it through to the next weekend. I have Oregon as the lowest rated 1 seed (by a fair margin), so I knocked those ducks out.

I also felt a need to have at least one low rated seed (10+) in the sweet 16 (I expect there will be more than 1, possibly several). Gonzaga looked like the most probable team to pull this off.

Also, though the ratings on my previous post show Kansas as the #1 team, I reran the algorithm after eliminating the Michigan St games that Denzel Valentine missed due to injury, and that change put the Spartans on top. Was I cheating to get the team I wanted to pick as the champs into the #1 slot in my system? Yeah, sure, maybe. But Valentine is arguably the best player in the nation, and missing him obviously hurt their rating.

One final note before I display the game-by-game win probabilities throughout the tournament (adhering to my system’s calls): I did another trial with my algorithm where I replaced the margin-of-victory game probability extrapolation with one that ignores the score. I noticed a couple of teams rated much higher in those results, and I think they could be dangerous – St. Joe’s and Seton Hall.

Below, you can see the game-by-game probabilities by round given by my system:

First Round

Kansas 93.21% Austin Peay 6.79%
Connecticut 56.33% Colorado 43.67%
Maryland 65.78% San Diego St 34.22%
California 63.98% Hawaii 36.02%
Arizona 53.62% Wichita St 46.38%
Miami FL 80.45% Buffalo 19.55%
Iowa 67.52% Temple 32.48%
Villanova 87.00% UNC Asheville 13.00%
Oregon 92.62% Holy Cross 7.38%
Cincinnati 54.43% St Joseph’s PA 45.57%
Baylor 63.76% Yale 36.24%
Duke 74.28% UNC Wilmington 25.72%
Texas 68.31% Northern Iowa 31.69%
Texas A&M 80.12% WI Green Bay 19.88%
VA Commonwealth 57.44% Oregon St 42.56%
Oklahoma 81.15% CS Bakersfield 18.85%
North Carolina 91.22% FL Gulf Coast 8.78%
USC 57.14% Providence 42.86%
Indiana 76.02% Chattanooga 23.98%
Kentucky 75.99% Stony Brook 24.01%
Notre Dame 53.04% Michigan 46.96%
West Virginia 74.10% SF Austin 25.90%
Pittsburgh 54.58% Wisconsin 45.42%
Xavier 81.93% Weber St 18.07%
Virginia 91.86% Hampton 8.14%
Butler 54.25% Texas Tech 45.75%
Purdue 73.95% Ark Little Rock 26.05%
Iowa St 74.52% Iona 25.48%
Gonzaga 51.98% Seton Hall 48.02%
Utah 76.29% Fresno St 23.71%
Syracuse 53.28% Dayton 46.72%
Michigan St 88.23% MTSU 11.77%

Second Round

Kansas 69.73% Connecticut 30.27%
Maryland 51.87% California 48.13%
Arizona 50.69% Miami FL 49.31%
Villanova 66.34% Iowa 33.66%
Oregon 59.89% Cincinnati 40.11%
Duke 53.83% Baylor 46.17%
Texas A&M 57.86% Texas 42.14%
Oklahoma 65.14% VA Commonwealth 34.86%
North Carolina 70.49% USC 29.51%
Kentucky 52.76% Indiana 47.24%
West Virginia 68.30% Notre Dame 31.70%
Xavier 59.40% Pittsburgh 40.60%
Virginia 66.83% Butler 33.17%
Purdue 55.01% Iowa St 44.99%
Utah 53.54% Gonzaga 46.46%
Michigan St 75.56% Syracuse 24.44%

Sweet Sixteen

Kansas 65.72% Maryland 34.28%
Villanova 59.12% Arizona 40.88%
Duke 51.58% Oregon 48.42%
Oklahoma 57.97% Texas A&M 42.03%
North Carolina 58.84% Kentucky 41.16%
West Virginia 59.09% Xavier 40.91%
Virginia 54.40% Purdue 45.60%
Michigan St 68.07% Utah 31.93%

Elite Eight

Kansas 53.32% Villanova 46.68%
Oklahoma 54.64% Duke 45.36%
North Carolina 52.13% West Virginia 47.87%
Michigan St 56.94% Virginia 43.06%

Final Four

Kansas 57.20% Oklahoma 42.80%
Michigan St 53.62% North Carolina 46.38%

Championship

Michigan St 51.71% Kansas 48.29%

Yet Another CBB Rating System

March 15, 2016March 15, 2016 ~ Kevin Indrebo ~ Leave a comment

It’s here! March Madness, that time of year where we all try to predict how a bunch of 20-year-olds playing a game filled with randomness will perform. It’s fun! But we sure don’t bet real money on it, cause that would be illegal!

Actually guys, it’s not as much fun for me again this year, because, for the 3rd consecutive season, my team was not invited to keep playing. Someone please fix this. Sigh…

Anyway, this year I decided to build my own quantitative rating and prediction system to help me fill out my bracket. Here’s how my system works:

I define a numeric rating, R(i), for each team i. I also define the probability of team i defeating team j on a neutral court as

P(i->j) = R(i) / (R(i) + R(j)).

I then create a cost function, defined over all games played during the season, which is the sum over

[P(i->j) – GP(i->j)]^2,

where GP(i->j) is an estimated game probability that depends only on the result of each individual game. The GP probability value for each game must be provided as an input to the algorithm.

Of course, coming up with a good value for GP(i->j) is tricky, since we only have one occurrence (one result) for each game. The simplest method for determining GP(i->j) is to assign 100% probability if team i defeated team j, and 0% otherwise. But this ignores home/away, so it’s not ideal.

For my system, I decided to build a model based on the home/away adjusted margin of victory for the game. If team i wins by a large margin, P(i->j) will be close to 100%, but if it’s a close game, the value will be closer to 50%. (The home/away factor adjusts the margin in favor of the road team by 3.5 points)

To compute the final ratings for all teams, I initialize all teams with an equal rating (1.0) and perform an iterative optimization that minimizes the overall cost function with respect to the ratings. I use gradient descent as the optimization procedure.

Below, I list my ratings for the top 100 teams in division 1. (My Marquette Golden Eagles just managed to sneak in at #99, woo!). Note that the value of a team’s rating carries no particular meaning by itself – it’s only useful when compared to the other team ratings.

Rank	Team	Rating
1	Kansas	8.779
2	Michigan St	8.317
3	North Carolina	8.174
4	Villanova	7.717
5	West Virginia	7.504
6	Virginia	7.149
7	Louisville	6.637
8	Oklahoma	6.602
9	Purdue	6.035
10	Kentucky	5.737
11	Duke	5.483
12	Arizona	5.342
13	Xavier	5.200
14	Miami FL	5.199
15	Oregon	5.151
16	Indiana	5.136
17	Iowa St	4.919
18	Texas A&M	4.802
19	Baylor	4.700
20	Maryland	4.601
21	SMU	4.514
22	Utah	4.400
23	Iowa	4.390
24	California	4.266
25	Vanderbilt	4.246
26	Wichita St	3.983
27	Gonzaga	3.842
28	Connecticut	3.827
29	Pittsburgh	3.552
30	Butler	3.543
31	Seton Hall	3.538
32	VA Commonwealth	3.531
33	Notre Dame	3.496
34	Texas	3.487
35	USC	3.465
36	Cincinnati	3.444
37	Florida	3.443
38	St Mary’s CA	3.221
39	Creighton	3.182
40	South Carolina	3.154
41	Kansas St	3.117
42	Michigan	3.107
43	Syracuse	3.057
44	Texas Tech	2.990
45	Colorado	2.969
46	Wisconsin	2.953
47	St Joseph’s PA	2.882
48	Washington	2.825
49	Florida St	2.791
50	Valparaiso	2.700
51	Dayton	2.682
52	Yale	2.671
53	SF Austin	2.620
54	Oregon St	2.617
55	Georgia Tech	2.589
56	Clemson	2.589
57	Providence	2.559
58	Northwestern	2.529
59	Ohio St	2.515
60	Georgia	2.467
61	Hawaii	2.400
62	San Diego St	2.388
63	BYU	2.353
64	Arkansas	2.330
65	UCLA	2.309
66	Tulsa	2.297
67	G Washington	2.284
68	Virginia Tech	2.209
69	Arizona St	2.182
70	Georgetown	2.167
71	Houston	2.114
72	Ark Little Rock	2.110
73	Rhode Island	2.089
74	Nebraska	2.072
75	UC Irvine	2.046
76	Mississippi	2.045
77	Stanford	2.029
78	LSU	2.023
79	Princeton	2.012
80	NC State	2.006
81	Memphis	2.003
82	Monmouth NJ	1.994
83	Evansville	1.949
84	Alabama	1.925
85	St Bonaventure	1.921
86	UNC Wilmington	1.896
87	Richmond	1.896
88	S Dakota St	1.887
89	Temple	1.884
90	Mississippi St	1.881
91	Stony Brook	1.806
92	Oklahoma St	1.794
93	Akron	1.787
94	Tennessee	1.764
95	Davidson	1.751
96	William & Mary	1.749
97	Santa Barbara	1.732
98	James Madison	1.731
99	Marquette	1.727
100	Iona	1.674

Ranting on the RPI

November 11, 2015November 11, 2015 ~ Kevin Indrebo ~ Leave a comment

College basketball starts this week! It seems like it’s been a longer offseason than normal to me. I thought it would never get here…

I guess that’s just how it works when your team finishes (tied for) last in conference one year, but you expect big things from the upcoming season. MU has a top 10 ranked recruiting class, including the phenom Henry Ellenson, who many think will be a top 10 pick in this year’s NBA draft. So I’m super excited, but…

I’m a little concerned that we may have screwed ourselves when it comes to selection/seeding for the NCAA tournament. Our non-conference schedule is set up in such a way that we play a handful of games against quality opponents (Belmont, Iowa, LSU, NC State/ASU, Wisconsin), but then face a slew of total crap. I mean the real bottom feeders in division I. In December/January, we play 7 games against teams not expected to finish within the top 300. Barring some type of unpredictable catastrophe, we will win all those games. Easily.

But here’s the problem. The NCAA selection committee uses the RPI as a ranking tool to guide them when selecting/seeding the tourney. It’s a simple (really, simplistic) formula for rating teams, based on win percentage (25%), opponent’s win percentage (50%), and opponent’s-opponent’s win percentage (25%). And this rating system is flawed. The MU non-conference SOS (strength of schedule) will look terrible, and if you only look at the rating itself, it will look like MU didn’t play a single decent team.

I’m far from the first person to complain about the RPI and using it for selection/seeding. But I think it’s likely to anger me even more than usual this year due to my team’s schedule. And I want to add my own view, which I think differs somewhat from the most common sentiments.

The biggest criticism I’ve heard about the RPI is that it rewards achievement rather than performance – meaning a win is a win no matter how the 2 teams performed during the game. And it turns out that performance is a much better predictor than end result. The KenPom ratings are probably the most well-known system that rates teams based on performance. Another common criticism is that it weights SOS (75%) too much in the formula.

But there’s another flaw with the RPI that I believe is even worse. Because the rating for a team is an aggregation of win/loss percentage over all games, the actual results of individual games don’t even matter. And every single game is given equal weight, no matter how uninteresting the result. So, imagine a scenario where a bubble team, with a current RPI of 50, plays a bottomfeeder, with an RPI of 325. Let’s say the bubble team wins by 35 points. Expected result, right? This result wouldn’t change any sane person’s opinion of the bubble team. But what happens? The bubble team’s RPI is almost guaranteed to drop by a significant amount. That makes no logical sense. I refuse to believe in the validity of a system that punishes teams for winning games they are expected to win.

Another consequence of this is that tournament teams are rewarded for scheduling a slew of mediocre competition, while punished for playing some really good teams and a few really bad teams. The really bad opponents will kill your SOS, enough that playing those really good teams won’t be enough to balance it out. And if you schedule a bunch of games against good teams, you’ll probably lose some. Much safer, much better, to avoid playing teams that might beat you, as long as you also avoid the really terrible teams with atrocious records. No one can tell me this system is good for the game.

So what could we do to fix this problem? KenPom (and others) use performance, which includes margin of victory. But another way to deal with this issue, even without using scoring margins, is to weight games based on information.

When you weight games based on information, you’ll ignore results that tell you nothing you didn’t already know. Example: if UNC is currently ranked 5th, and they beat the 250th team, that result tells you nothing. So it should have no impact on the ratings. If, however, UNC loses to the 250th team, that tells you a lot, since it’s a big surprise, and should have a major impact. If UNC beats the 25th best team, that does tell you something, as #25 will sometimes beat #5, so that game should be given a moderate weight. But #25 beating #5 would be given an even higher weight, since it’s a more surprising result.

Comments on P-values

August 19, 2015August 19, 2015 ~ Kevin Indrebo ~ 2 Comments

Noah Smith, popular economics blogger, recently posted a rebuttal to the criticism on the use of p-values in hypothesis testing. While he makes a few good points on why p-values and significance testing have value, I think that his post fails to address a couple of major issues.

First, he states that good science involves replication of results. This is absolutely true, and is, in my opinion, the best antidote for many of the issues related to significance testing. But from my experience in academia (I was an engineering grad student from 2003-2008), the real problem isn’t the lack of good scientists, it’s the poor system of incentives. This extends from the management of journals to the peer review process to the tenure system itself.

Because journals are reluctant to publish negative (non-significant) results, if dozens of independent groups perform similar studies, but only one of these shows significance, this single positive may be the only one published. In fact, the groups that found non-significance will probably not even attempt to publish their work, and no one will have any reason to believe that the lone positive result is false. In this case, no researcher has to do anything wrong in order to produce a bad conclusion by the field.

Also, the tenure system requires that professors continually publish papers in respected journals, which requires doing original work and finding interesting, previously unknown effects. Replicating studies that others have already accepted as legitimate (whether your own or not) gets you no closer to tenure.

The other major problem with p-values is the way they’re interpreted. The common perception is that a p-value of 0.05 means there’s a 95% chance the effect is real (non-random). But the p-value actually represents p(x|h0), where x is the data and h0 is the null hypothesis. What the researcher wants to know is p(h0|x). The first value (what you have) tells you the probability of observing the data you found, assuming that the null hypothesis is true. But you want to know the probability of the null hypothesis, given the data.

Bayes’ theorem could be used to convert from the term you have to the one you want if you knew p(x), the prior probability of the data. Unfortunately, there’s no way to find this value. However, this paper does a nice job of setting bounds on the value of p(h0|x), depending on the form of the distribution on the data. An interesting result from this work is that for many types of tests, simply subtracting 1 from the t-stat will give you a decent approximation.