## Tuesday, January 17, 2012

I spent the last few days scraping game data, dusting off code and generally getting the basketball predictor back online.  The current version of the predictor uses an average of 4 linear regressions.  These models are based upon: (1) the Govan rating, (2) the TrueSkill rating, (3) a Batch Gradient Descent (BGD) rating, and (4) a rating based on a wide variety of statistical measures (such as "offensive rebounds per possession").   Individually, each of these models has a RMSE of less than 11 on my test corpus.   Unfortunately, they're all highly correlated, so the combined model doesn't do any better than the best of the underlying models.  Currently it has an RMSE of 10.79 on my test corpus.

During the season I compare the model predictions against the line and "bet" games where the prediction differs significantly from the line.  "Significantly" is a relative term.  When I first started doing this, my model often differed from the line by 10 points or more.  As the model has improved, those differences have narrowed considerably.  (As would be expected.  The line is usually the best predictor.)  In my testing so far this year, I've only seen a difference of more than 5 points once.  There is some good mathematical work on sizing wagers based upon bankroll, perceived advantage, etc., but I've gone to a simple approach of betting $10 with an advantage of < 5 points and$20 with an advantage of >5 points.  (Adopted after the 1/14 games shown below.)

Here are the games the model has "bet" so far (no real money was harmed):

 Date Home Score Away Score MOV Line Pred Adv Risk Win Result Won v.Line 1/14 Tennessee St. 52 SIU Edwardsville 49 3 16 8.8 -7.2 20 17.39 17.39 1 1 1/14 LA Lafayette 87 Florida Intl. 81 6 10 5.1 -4.9 20 19.05 19.05 1 1 1/14 Murray St. 81 Tennessee Tech 73 8 12 16.5 4.5 20 18.18 -20 1 0 1/14 Houston 55 Memphis 89 -34 -8.5 -4.2 4.3 20 17.39 -20 1 0 1/15 Ohio St. 80 Indiana 63 17 13.5 9.1 -4.4 10 9.09 -10 1 0 1/15 Bradley 78 Northern Iowa 67 11 -10 -7.2 2.8 10 8.70 8.70 0 1 1/15 USC 47 UCLA 66 -19 2 1.5 -0.5 10 9.09 9.09 0 1 1/16 Syracuse 71 Pittsburgh 63 8 13.5 17.3 3.8 10 9.09 -10 1 0

So far this season the model is 50% against the line (and subsequently down about $5) and 75% picking the correct outcome. The (evolving) model picked 38 games last year, and over the two seasons so far is at a 63% win percentage and 60% versus the line (+$133).  Both are probably short-term aberrations -- the model has a 74% win percentage when tested against my corpus of 12K games.

I won't generally be posting predictions, but I will try to summarize the model's performance a few times during the season, as I'm sure it makes for interesting reading :-).