It’s here! March Madness, that time of year where we all try to predict how a bunch of 20-year-olds playing a game filled with randomness will perform. It’s fun! But we sure don’t bet real money on it, cause that would be illegal!
Actually guys, it’s not as much fun for me again this year, because, for the 3rd consecutive season, my team was not invited to keep playing. Someone please fix this. Sigh…
Anyway, this year I decided to build my own quantitative rating and prediction system to help me fill out my bracket. Here’s how my system works:
I define a numeric rating, R(i), for each team i. I also define the probability of team i defeating team j on a neutral court as
P(i->j) = R(i) / (R(i) + R(j)).
I then create a cost function, defined over all games played during the season, which is the sum over
[P(i->j) – GP(i->j)]^2,
where GP(i->j) is an estimated game probability that depends only on the result of each individual game. The GP probability value for each game must be provided as an input to the algorithm.
Of course, coming up with a good value for GP(i->j) is tricky, since we only have one occurrence (one result) for each game. The simplest method for determining GP(i->j) is to assign 100% probability if team i defeated team j, and 0% otherwise. But this ignores home/away, so it’s not ideal.
For my system, I decided to build a model based on the home/away adjusted margin of victory for the game. If team i wins by a large margin, P(i->j) will be close to 100%, but if it’s a close game, the value will be closer to 50%. (The home/away factor adjusts the margin in favor of the road team by 3.5 points)
To compute the final ratings for all teams, I initialize all teams with an equal rating (1.0) and perform an iterative optimization that minimizes the overall cost function with respect to the ratings. I use gradient descent as the optimization procedure.
Below, I list my ratings for the top 100 teams in division 1. (My Marquette Golden Eagles just managed to sneak in at #99, woo!). Note that the value of a team’s rating carries no particular meaning by itself – it’s only useful when compared to the other team ratings.
|38||St Mary’s CA||3.221|
|47||St Joseph’s PA||2.882|
|62||San Diego St||2.388|
|72||Ark Little Rock||2.110|
|88||S Dakota St||1.887|
|96||William & Mary||1.749|