Friday, October 28, 2011

Football Predictions

Here are college football predictions for this week.  I discovered a couple of different bugs in my input data since last weeks predictions; these should be somewhat better.  Apologies as always for the old-school formatting, and heed my Disclaimer as well.

+--------------------+--------------------+--------------------+
|Hname               |Aname               |prediction(mov)     |
+--------------------+--------------------+--------------------+
|ohio state          |wisconsin           |-11.7               |
+--------------------+--------------------+--------------------+
|western michigan    |ball state          |15.5                |
+--------------------+--------------------+--------------------+
|washington          |arizona             |7.4                 |
+--------------------+--------------------+--------------------+
|duke                |virginia tech       |-4.3                |
+--------------------+--------------------+--------------------+
|utah                |oregon state        |1.5                 |
+--------------------+--------------------+--------------------+
|ucla                |california          |1.7                 |
+--------------------+--------------------+--------------------+
|central florida     |memphis             |26.0                |
+--------------------+--------------------+--------------------+
|tulsa               |southern methodist  |5.4                 |
+--------------------+--------------------+--------------------+
|texas tech          |iowa state          |18.5                |
+--------------------+--------------------+--------------------+
|texas a&m           |missouri            |16.1                |
+--------------------+--------------------+--------------------+
|texas               |kansas              |22.7                |
+--------------------+--------------------+--------------------+
|southern california |stanford            |-10.0               |
+--------------------+--------------------+--------------------+
|texas-el paso       |southern mississippi|-8.1                |
+--------------------+--------------------+--------------------+
|tennessee           |south carolina      |-.1                 |
+--------------------+--------------------+--------------------+
|san diego state     |wyoming             |20.3                |
+--------------------+--------------------+--------------------+
|rutgers             |west virginia       |2.5                 |
+--------------------+--------------------+--------------------+
|penn state          |illinois            |5.0                 |
+--------------------+--------------------+--------------------+
|oregon              |washington state    |24.9                |
+--------------------+--------------------+--------------------+
|oklahoma state      |baylor              |7.3                 |
+--------------------+--------------------+--------------------+
|notre dame          |navy                |17.6                |
+--------------------+--------------------+--------------------+
|indiana             |northwestern        |-5.7                |
+--------------------+--------------------+--------------------+
|north carolina      |wake forest         |5.3                 |
+--------------------+--------------------+--------------------+
|new mexico state    |nevada              |-6.2                |
+--------------------+--------------------+--------------------+
|nebraska            |michigan state      |-6.6                |
+--------------------+--------------------+--------------------+
|kentucky            |mississippi state   |-11.1               |
+--------------------+--------------------+--------------------+
|michigan            |purdue              |22.1                |
+--------------------+--------------------+--------------------+
|miami (ohio)        |buffalo             |1.0                 |
+--------------------+--------------------+--------------------+
|maryland            |boston college      |6.8                 |
+--------------------+--------------------+--------------------+
|marshall            |alabama-birmingham  |14.2                |
+--------------------+--------------------+--------------------+
|louisville          |syracuse            |-5.3                |
+--------------------+--------------------+--------------------+
|louisiana tech      |san jose state      |11.6                |
+--------------------+--------------------+--------------------+
|louisiana-monroe    |western kentucky    |-6.2                |
+--------------------+--------------------+--------------------+
|middle tennessee    |louisiana-lafayette |4.7                 |
|state               |                    |                    |
+--------------------+--------------------+--------------------+
|kansas state        |oklahoma            |-4.0                |
+--------------------+--------------------+--------------------+
|minnesota           |iowa                |-13.5               |
+--------------------+--------------------+--------------------+
|idaho               |hawaii              |-8.0                |
+--------------------+--------------------+--------------------+
|florida             |georgia             |2.4                 |
+--------------------+--------------------+--------------------+
|florida state       |north carolina state|13.2                |
+--------------------+--------------------+--------------------+
|east carolina       |tulane              |9.9                 |
+--------------------+--------------------+--------------------+
|nevada-las vegas    |colorado state      |1.3                 |
+--------------------+--------------------+--------------------+
|georgia tech        |clemson             |-4.4                |
+--------------------+--------------------+--------------------+
|akron               |central michigan    |-1.8                |
+--------------------+--------------------+--------------------+
|kent                |bowling green state |-5.2                |
+--------------------+--------------------+--------------------+
|auburn              |mississippi         |11.5                |
+--------------------+--------------------+--------------------+
|arkansas state      |north texas         |15.2                |
+--------------------+--------------------+--------------------+
|vanderbilt          |arkansas            |-5.3                |
+--------------------+--------------------+--------------------+
|arizona state       |colorado            |26.3                |
+--------------------+--------------------+--------------------+
|new mexico          |air force           |-16.8               |
+--------------------+--------------------+--------------------+
|florida             |troy                |11.7                |
|international       |                    |                    |
+--------------------+--------------------+--------------------+
|pittsburgh          |connecticut         |11.4                |
+--------------------+--------------------+--------------------+
|brigham young       |texas christian     |-10.1               |
+--------------------+--------------------+--------------------+
|miami (florida)     |virginia            |15.8                |
+--------------------+--------------------+--------------------+
|houston             |rice                |21.5                |
+--------------------+--------------------+--------------------+

Thursday, October 20, 2011

Predicting the Oblong Ball

I was recently challenged by some friends to predict NCAA college football, so I gathered up some historical data from this archive and adapted some of the better rating systems I've investigated to create a predictor.  It's hard to judge the performance.  It does not perform as well as the systems reported here according to my standard cross-validation testing, but my implementation of Sagarin's ELO also underperforms the reported performance.  Since my implementation of ELO tracks the Sagarin performance very well in basketball, I suspect there's a systemic difference in how performance is measured.

At any rate, I don't intend to spend a lot of time on this, but just for amusement, here are the predictions for this weeks games:

alabama over tennessee by 10.6
arkansas over mississippi by 14.8
ball state over central michigan by 1.8
boise state over air force by 25.7
california over utah by -11.4
central florida over alabama-birmingham by 21.2
clemson over north carolina by 5.1
florida atlantic over middle tennessee state by -12.4
florida state over maryland by 5.9
hawaii over new mexico state by .5
houston over marshall by 12.9
illinois over purdue by 11.0
iowa over indiana by 6.5
kansas state over kansas by 18.7
louisiana state over auburn by 14.4
louisiana-lafayette over western kentucky by 6.5
miami (florida) over georgia tech by -15.4
navy over east carolina by 5.2
nebraska over minnesota by 14.4
nevada over fresno state by 7.8
north texas over louisiana-monroe by 2.7
northern illinois over buffalo by 3.1
notre dame over southern california by 2.5
ohio over akron by 21.1
oklahoma state over missouri by 9.9
oklahoma over texas tech by 9.4
oregon over colorado by 22.8
penn state over northwestern by 10.9
rutgers over louisville by 12.0
south florida over cincinnati by -6.9
southern mississippi over southern methodist by -4.4
stanford over washington by 17.7
temple over bowling green state by 17.2
texas a&m over iowa state by 14.5
texas christian over new mexico by 21.9
texas-el paso over colorado state by -1.8
toledo over miami (ohio) by 16.4
tulane over memphis by 10.2
tulsa over rice by 4.4
ucla over arizona by 1.6
utah state over louisiana tech by -2.9
vanderbilt over army by 5.1
virginia tech over boston college by 15.6
virginia over north carolina state by -4.4
wake forest over duke by -2.3
washington state over oregon state by 5.2
west virginia over syracuse by 3.2
western michigan over eastern michigan by 13.7
wisconsin over michigan state by 7.0

Apologies for the awful formatting -- I put this together in 3 days and didn't put much effort in to making pretty.

The Usual Disclaimers apply:  Use this information at your own risk; it is not intended for gambling purposes and the Net Prophet does not encourage or recommend gambling on sports events.

Wednesday, October 12, 2011

More on Statistical Prediction

I am continuing to explore statistical prediction.  In particular, after implementing the Four Factors as described here, I became interested in examining other statistics generated from the base set of statistics.  A subset of these generated statistics are ratios of the base statistics, like the "Offensive Balance" statistic I defined in my earlier post:
Offensive Balance = (# 3 Pt Attempts) / (# FG Attempts)
You can probably come up with a few sensible statistics like these off the top of your head.  But since I've seen time and again the value of exploring all options -- even the ones that make no "sense" -- I decided to calculate and test all of these sorts of ratios to see which of them (if any) have predictive value.

That's a more difficult job than you might imagine.  In my data sets there are 13 base statistics per team per game (FG Made, FG Attempted, 3PT Made, 3PT Attempted, FT Made, FT Attempted, Offensive Rebounds, Total Rebounds, Assists, Turnovers, Steals, Fouls, Score, and MOV).  For predictive purposes, we want to use the average of these over a team's previous games [1] and we can average by either game or possession - so that's 26 base statistics per team.  There are 26*25 = 650 possible ratios of those statistics.  But we also want to consider ratios not only of a team with itself but also of the team with its opponent, e.g., the ratio of the team's average number of 3 PT attempts in past games to it's opponents average number of 3 PT attempts in past games.  That adds another 676 possible ratios.  Finally, we also want to consider the statistics for a team's past opponents, e.g., the average number of 3 PT attempts in past games of a team's opponents in those games.  Adding those in creates a lot more ratios.  Multiply all that by the 12K games in my training data, and it's a lot of data.

My approach is to generate a subset of the possible ratios and test them for predictive value.  For various reasons I settled on generating all the ratios with a particular numerator, e.g.,
(FG Made) / (# Fouls)
(FG Made) / (Opponent's # Fouls)
(FG Made) / (# Fouls by Opponents in Past Games)
etc.
This ends up adding about 96 new statistics to every game in the database.  I can then take this expanded data and pump it through the usual linear regressions, etc., to find the statistics that have predictive value.  But this is a slow process -- for each numerator, it takes hours to generate all the statistics and run them through iterations of the predictive model.  (This has the disadvantage that I may miss some combination of generated statistics with different numerators that are only valuable in combination.)

So far, I haven't identified any ratios that result in significantly better predictions.  But I have been surprised that (at least so far) the models have selected a number of unexpected ratios as being of value.  For example:
(Away team's Average FG Made) / (Away team's Average 3PTs Attempted)
(Away team's Average FG Made) / (Away team's Average 3PTs Made)
These ratios seem to be capturing something about the Away team's offensive balance between inside and outside play.  Interestingly, both the ratio with 3 PTs Attempted and 3 PTs Made are significant -- it may be that the first captures the "offensive strategy" (whether a team plays outside first or inside first) and the second captures something about how effective they are at executing that strategy.  It's also interesting that these ratios are only significant for the Away team -- apparently the home team's performance doesn't depend strongly on what sort of offensive strategy it uses.

Another interesting statistic:
(Home team's Average FG Made) / (Home team's Past Opponents' Average Offensive Rebounds)
It takes a moment's thought to grasp this statistic.  It compares the average number of FGs made by a team to the offensive rebounding of the opponents the team faced.  If we take Offensive Rebounds as an indicator of how strongly teams are contesting inside play, then this ratio would seem to say something about how effective the home team's inside play has been relative to its opponents.

Hopefully working through all the ratio statistics will turn up a set of statistics that provide significantly better predictive value.

[1] Averaging isn't the only option here, and there are other possibilities for generated statistics that might be useful, but I feel that ratios are a reasonably fertile area for exploration.