Tuesday, January 22, 2013

Team to Team Variation in Predictions

The current version of the Prediction Machine averages about 11 points of error in predicting games, across all teams and all seasons.  I've speculated that there are some subsets of games where the error is significantly less -- for example, it might be the case that we can predict much more accurately when a good rebounding team plays a poor rebounding team.  However, my efforts to identify those subsets have been largely futile and there's some circumstantial evidence to suggest that no subsets exist -- primarily that a Support Vector Machine does no better than a Linear Regression at prediction.  (We would expect a SVM to do better in a data set with significant subsets.)

Last week I thought it would be interesting to look at what teams the PM has done the best at predicting this year and which ones the worst.  (For some reason, it's never occurred to me previously to look at this.)  So I gathered up all the predictions and results for this season and segmented them out by teams.  (Note: I'm only looking at the games after the first 1000 games of the season and not the last 100 in this sample.)
The overall best team is Idaho, which the PM has predicted with about 4.6 points of error.  The PM has gotten 5 of Idaho's games within 2 points.  It missed one game by 10 points, but that was by far the worst.
The overall worst team is Mississippi St with 18 points of error.  The PM missed games by 35, 29, 17, and 15 points.  So the overall range of predictions runs from less than 1/2 the average error to almost 2x the average.
I also took a look at the error for home games and away games separately.

Just looking at home games, the most predictable is TX Pan American (2.3 points error) and the worst Youngstown State (28.7 points error).  For away games only, the most predictable is Portland St (0.86 points error!) and the worst is Maryland (20.5 points error).   For Portland State's four away games, the PM was off by 0.7, 1.3, 0.4 and 0.7 points (!).  Maryland is a bit deceptive -- they only have two away games in the sample, and one was the Northwestern game which they were expected to lose by 9 and won by 20.

Some of this is no doubt just random variation.  Just by chance the PM will get some team's games close and some team's far off.  That effect should diminish the more games we sample, so I took a look at the entire 2012 season.

The overall best team to predict in 2012 was Dartmouth, with 6.3 points of error on average.  The worst was New Orleans, with 19.3 points of average error.  Once again we see a range of roughly 1/2 to 2x the average error.  The best home team to predict was Indiana St, at 4.7 points of error, and the worst Longwood at 17.15 points of error.  The best and worst away teams were Gardner-Webb (4.14) and New Orleans (22).   So again we see the overall range of predictions runs about 1/2 the average error to about 2x the average error.

Another test we can do is to look at how many teams that were predictable in the first half of the season are also predictable in the second half of the season.  If the effect is random, we'd expect to see a random level of overlap.  For the 2012 season, if we look at the most predictable half of the teams in both the first part of the season and the second part of the season, there's almost exactly 50% overlap -- a strong indication that the effect is just random variation.

The conclusion is that the error range on the PM's predictions for particular teams runs from about 1/2 the overall average to about 2 times the average, but that this variation is probably random.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.