Sunday, July 24, 2011

Predicting Using Winning Percentages

I'm working on a journal article summarizing my recent findings, and I realized that I never tested a predictor based upon winning percentages.  This is an unfortunate oversight, because winning percentages is the easiest obvious measure of a team's strength, and was the primary metric for selecting (at-large) Tournament participants before the development of RPI.  And RPI was developed primarily to address the criticism that not all winning percentages are equal.  So I've corrected that error by going back and testing a predictor based upon winning percentages:

  Predictor    % Correct    MOV Error  
Winning Percentages 65.0%11.63
RPI (infinite)74.6%11.33

Both results are interesting.  % Correct shows little improvement over simply picking the home team to win every game, which I take to be further confirmation of the HCA in college basketball.  On the other hand, MOV Error is nearly as good as the best variant of RPI.

Friday, July 22, 2011

Possessions / Game

As mentioned in my previous post, I've been working lately on a game data set that includes game statistics like Field Goal Percentage, Offensive Rebounds, etc.  After some mucking about, I've imported all the data from my web scraping as well as the scraping done by Lee-Ming over at This Number Crunching Life.  Neither of us have the data from after the start of the Tournament last year, so I'm bugging Lee-Ming to scrape up that data and make his archive complete.  There's also about 10-20% of the games that are missing, either because the scrapers failed to pick them up, or because the Yahoo Sports! statistics were incomplete.  Those problems aside, I have about 11K games in the training set, so still plenty to work with.

Apart from reading in the basic stats, the first thing I looked at was calculating the number of possessions in the game.  Possessions isn't a collected statistic, so we have to derive it from the statistics we have.  Unfortunately, it isn't possible to calculate the number exactly, so I'm using this formula:
Possessions = FGA - OReb + 0.475*FTA + TO
which I've stolen from Ken Pomeroy.   The "0.475" accounts for the fact that the first free throw of a "shooting two" situation doesn't end the possession.  (Ken Pomeroy is usually credited for research indicating that 47.5% of free throws end possessions, but I've never been able to find that research.)  We know that each team in a game should have the same number of possessions (+/- one possession), so we calculate possessions for both teams, average, and use that number.

We care about possessions because some statistics can only be interpreted in light of how many possessions a team had in the game.  For example, suppose a team grabbed 10 offensive rebounds.  That's a good performance in a game with 50 possessions; not so good in a game with 100 possessions.

Here are the stats for possessions over the games in the training set:

  AverageMaximum Minimum
Possessions 6812146

That maximum of 121 piqued my interest, so I looked into the data and saw it was the 3/12/09 game between Connecticut and Syracuse -- a game that went to 6 overtimes.  That points out one shortcoming in my game data -- I'm not capturing OTs or the number of minutes played.  Hopefully that won't corrupt the value of the data too much.

(The #2 game was CSU-Fullerton vs. CSU Northridge on 2/13/10 -- a triple OT game with 108 possessions.  At the other end of the scale, Denver and LA Monroe played the 46 possession game on 1/11/09.  Interestingly, they were fairly efficient and scored a total of 105 points.  Illinois and Penn St. played a 57 possession game on 2/18/09 and only managed 71 total points -- a game that will no doubt be mentioned in some future post on offensive efficiency. :-)

Friday, July 15, 2011


I've been remiss in posting lately. Mostly because it is summer time and I've simply had less time to work on this topic. But also because I've been doing some ground work for the next stage of analysis. All the work I've done here so far has used only game outcomes, and I've been able to use the excellent and complete set of game scores collected by Lee over at Number Crunching Life.  However, for the next stage of analysis I want to look at game statistics like Field Goal Percentage, Offensive Rebounds, etc.  I have partially collected these stats for the last two seasons, but I need to quality-check the data I have, fill in missing games and so on.  I have a crawler written using Web Harvest but it seems to sporadically miss games.  The crawler from Lee is written in C#, and doesn't do exactly what I want, which presents it's own problems.

I should probably also fire off an email to the folks at Yahoo! Sports and see if they won't just make an archive available in some convenient format.

Friday, July 8, 2011

Experimenting with Recency

I wanted to take another look at the relative value of recent games, that is, whether prediction is more accurate if we base it upon only a team's recent performance instead of the entire season-to-date.  To do this, I calculated the Govan ratings for each team based upon the last "N" games, for various values of N.  Here are the results:

  Predictor  NMOV Error  % Correct

With N=30, this is essentially the same as using all the games to date.  Smaller values of N throw out the oldest games to use only N games.  The trend is clear; it seems that even the oldest games add useful information for prediction.

If using games from the very beginning of the season is useful in predicting games at the end of the season, perhaps using even older games would be useful.  Perhaps we should include (say) the previous season's games when predicting.   ("That's crazy talk!" I hear you say.  We shall see...)

It's fairly straightforward to modify my workflow so that it doesn't reset at the beginning of each new season.  I currently use games from the 2008-2009, 2009-2010 and 2010-2011 seasons.  If we run without resetting at the beginning of each new season (essentially treating the data as one long season), this is the comparative performance for our favorite two predictors:

  Predictor  MOV Error% Correct
Govan (normal)10.8073.5%
Govan (merged seasons)11.0473.4%
TrueSkill (normal)10.8873.3%
TrueSkill (merged seasons)10.9973.4%

Interestingly, performance does not suffer much from including previous seasons.  As you might expect, TrueSkill suffers less of a hit than Govan.   Since TrueSkill essentially updates a hypothesis with every game, it's better able to discard the contrary evidence from earlier games.

Now let's look at performance within a season, as I did in this post.  Here I've broken the season down into four quarters of ~1000 games each, and I show the performance both with and without using the previous season's games.  (This is really the last four quintiles of the season -- in both cases I throw away the first 1000 games of each season.)

  Quarter of
the Season

Not unsurprisingly, using the previous season improves prediction during the first quarter of the season.  (TrueSkill shows the same pattern, improving by about 2.2% in the first quarter.)  Even though the previous season's performance isn't a good predictor, it is apparently better than starting everyone off with a clean slate.

This suggests that we should use the previous season's data until we have enough current season data to make good predictions.  We should be able to do this combining the recent games window and the merged seasons.  If we always predict using the (say) previous 30 games, and include the previous season, that should be close to what we want, although it might not throw away the previous season's games fast enough.  Sadly, I can't seem to get that to work (or rather, it works, but returns very bad results which make me think it's broken).

As an alternative, we can "prime" each season by including the last 1000 games of the previous season.  This has the disadvantage that the primer games impact the ratings for the whole season, but it's a simple approach and easy to implement:

  Predictor  MOV Error% Correct
Govan (normal)10.8073.5%
Govan (primed)10.7873.1%
TrueSkill (normal)10.8873.3%
TrueSkill (primed)10.9173.1%

Again if we look at the performance by quarters we see that performance in the first quarter games is much improved, but that's offset by poorer performance in the later quarters. That's a little trickier to implement. Generally speaking, it isn't always possible to "back out" a game from the ratings.  We can address that by "restarting" partway through the season without the primer games.  Here's what that yields:

  Predictor  MOV Error% Correct
Govan (normal)10.8073.5%
Govan (primed + restart)10.7470.7%
TrueSkill (normal)10.8873.3%
TrueSkill (primed)10.9470.5%

The result is a mixed bag.  MOV Error is improved by about 0.5% for Govan (which is really a 2% improvement in the first quarter games), but reduced by a similar amount with TrueSkill.  Meanwhile, % Correct drops rather dramatically for both ratings. 

In general, the evidence suggests that there is no benefit to limiting the number of current season games when calculating ratings, and there is mixed benefit in early season games from incorporating results from the previous season.

Wednesday, July 6, 2011

New Papers

A quick posting to highlight a couple of interesting papers.

The first is "A New Bayesian Rating System for Team Competitions" from ICML 2011.  (Special thanks to Henry Ware for pointing me at the paper.)  This paper addresses two shortcomings of vanilla TrueSkill:  (1) multi-way tie games, and (2) how individuals contribute to team performance.  The authors are able to show significantly better performance in their problem domain with some fairly straightforward modifications to TrueSkill.  The results don't really apply to what I'm looking at right now, but would potentially apply to modeling team performance as a function of all the individual players.  I'm not convinced there's much to be gained by trying to model individual basketball players.  I foresee at least two significant problems.  The first is that there are not enough games to build good individual models of the players (although bridging seasons might help with this).  The second is that I don't believe that team makeup changes significantly enough during the season to make tracking individuals worthwhile -- but I haven't done any analysis to see whether that intuition is correct or not.  But inasmuch as players don't contribute in a simplistic additive way to team performance, the work here on alternate methods for "summing" individual contributions would probably be very relevant.

The second paper is "An empirical comparison of supervised learning algorithms."  This paper surveys a wide variety of learning algorithms (e.g., neural networks, SVMs, naive Bayes, etc.) over a set of sample problems and categorizes their effectiveness.  Obviously the effectiveness of the algorithms depends (somewhat) on the problem set, but this paper presents some convincing evidence that boosted, calibrated decision trees, neural networks, and SVMs are the most effective available algorithms.  The paper is an interesting read, and the author's web site is also worth a look.

Sunday, July 3, 2011

Conference Vs. Non-Conference

As a follow-on to this posting where I looked at how well the predictors worked over different periods of the season and noted some odd behaviors, I looked at conference versus non-conference games, to see if there was some difference in performance.  Here are the results:

  Predictor  MOV Error  % Correct
TrueSkill -- Conference Games10.68972.10%
TrueSkill -- Non-Conference Games11.53976.39%
Govan -- Conference Games10.58372.17%
Govan -- Non-Conference Games11.32576.83%

This shows the same sort of split we see in early-season versus late-season predictions:  conference predictions are closer to the actual Margin of Victory, but more often on the wrong side of the contest.  Of course, most non-conference games are also early-season games, there's some interdependence in these results.  We can try to factor this out by looking at late-season non-conference games and early-season conference games:

  Predictor  MOV Error  % Correct
TrueSkill -- Conference Games10.68972.10%
TrueSkill -- Early Conference Games11.48566.67%
TrueSkill -- Non-Conference Games11.53976.39%
TrueSkill -- Late Non-Conference Games11.14872.68%
Govan -- Conference Games10.58372.17%
Govan -- Early Conference Games10.98869.57%
Govan -- Non-Conference Games11.32576.83%
Govan -- Late Non-Conference Games11.08973.04%

This seems to show that it's really the early season versus late season that matters (and the predictors are particularly bad at early-season conference games).

One easy follow-up to this experiment is to train our predictor on only conference games (or only non-conference games) and see if this improves performance within the category.  I won't post the numbers, but there's no advantage in training on non-conference games when predicting non-conference games, etc.

None of this explains the oddity of MOV Error improving while % Correct simultaneously worsens, but at least there's some evidence that conference versus non-conference is probably not a big factor in prediction performance.