Net Prophet: Experimenting with Recency

I wanted to take another look at the relative value of recent games, that is, whether prediction is more accurate if we base it upon only a team's recent performance instead of the entire season-to-date. To do this, I calculated the Govan ratings for each team based upon the last "N" games, for various values of N. Here are the results:

Predictor	N	MOV Error	% Correct
Govan	30	10.805	73.5%
Govan	25	10.963	73.1%
Govan	20	11.280	72.1%
Govan	15	11.980	70.2%

With N=30, this is essentially the same as using all the games to date. Smaller values of N throw out the oldest games to use only N games. The trend is clear; it seems that even the oldest games add useful information for prediction.

If using games from the very beginning of the season is useful in predicting games at the end of the season, perhaps using even older games would be useful. Perhaps we should include (say) the previous season's games when predicting. ("That's crazy talk!" I hear you say. We shall see...)

It's fairly straightforward to modify my workflow so that it doesn't reset at the beginning of each new season. I currently use games from the 2008-2009, 2009-2010 and 2010-2011 seasons. If we run without resetting at the beginning of each new season (essentially treating the data as one long season), this is the comparative performance for our favorite two predictors:

Predictor	MOV Error	% Correct
Govan (normal)	10.80	73.5%
Govan (merged seasons)	11.04	73.4%
TrueSkill (normal)	10.88	73.3%
TrueSkill (merged seasons)	10.99	73.4%

Interestingly, performance does not suffer much from including previous seasons. As you might expect, TrueSkill suffers less of a hit than Govan. Since TrueSkill essentially updates a hypothesis with every game, it's better able to discard the contrary evidence from earlier games.

Now let's look at performance within a season, as I did in this post. Here I've broken the season down into four quarters of ~1000 games each, and I show the performance both with and without using the previous season's games. (This is really the last four quintiles of the season -- in both cases I throw away the first 1000 games of each season.)

Quarter of the Season	Govan (normal)	Govan (merged)	Improvement
1st	11.40	11.23	+1.5%
2nd	10.70	11.44	-6.9%
3rd	10.52	11.02	-4.8%
4th	10.05	10.89	-8.4%

Not unsurprisingly, using the previous season improves prediction during the first quarter of the season. (TrueSkill shows the same pattern, improving by about 2.2% in the first quarter.) Even though the previous season's performance isn't a good predictor, it is apparently better than starting everyone off with a clean slate.

This suggests that we should use the previous season's data until we have enough current season data to make good predictions. We should be able to do this combining the recent games window and the merged seasons. If we always predict using the (say) previous 30 games, and include the previous season, that should be close to what we want, although it might not throw away the previous season's games fast enough. Sadly, I can't seem to get that to work (or rather, it works, but returns very bad results which make me think it's broken).

As an alternative, we can "prime" each season by including the last 1000 games of the previous season. This has the disadvantage that the primer games impact the ratings for the whole season, but it's a simple approach and easy to implement:

Predictor	MOV Error	% Correct
Govan (normal)	10.80	73.5%
Govan (primed)	10.78	73.1%
TrueSkill (normal)	10.88	73.3%
TrueSkill (primed)	10.91	73.1%

Again if we look at the performance by quarters we see that performance in the first quarter games is much improved, but that's offset by poorer performance in the later quarters. That's a little trickier to implement. Generally speaking, it isn't always possible to "back out" a game from the ratings. We can address that by "restarting" partway through the season without the primer games. Here's what that yields:

Predictor	MOV Error	% Correct
Govan (normal)	10.80	73.5%
Govan (primed + restart)	10.74	70.7%
TrueSkill (normal)	10.88	73.3%
TrueSkill (primed)	10.94	70.5%

The result is a mixed bag. MOV Error is improved by about 0.5% for Govan (which is really a 2% improvement in the first quarter games), but reduced by a similar amount with TrueSkill. Meanwhile, % Correct drops rather dramatically for both ratings.

In general, the evidence suggests that there is no benefit to limiting the number of current season games when calculating ratings, and there is mixed benefit in early season games from incorporating results from the previous season.

Net Prophet

Friday, July 8, 2011

Experimenting with Recency

No comments:

Post a Comment