I have about 4000 games in the training data for each season (2009, 2010, 2011) so I actually broke those down into quarters to give me an even 1000 games in each portion of the season. (I remind the reader that I remove the first 1000 games of the season from the training set because the ratings aren't yet reliable. So when I talk about the first 1000 games in this posting, those are actually the second thousand games played in that season.) I then tested the performance of the two best predictors: TrueSkill + mov adjustment, and the Govan ratings. In this case, I think it's easier to look at a graph of the results rather than a table. Let's start with the MOV Error by quarters of the season:
|MOV Error By Quarter|
Now let's look at a similar chart for the percentage of correct predictions:
|% Correct Predictions by Quarter|
I can hazard some guesses as to what is going on, but none seem particularly explanatory. More experiments would seem to be in order.
One question is whether the prediction performance is high in the first quarter because (1) the rating is only operating on a few games, or (2) the teams are playing differently (more predictably) in that quarter. To test that hypothesis, I could restart the rating at the beginning of each quarter. If (1) is true, then we'd expect to see similar performance in all four quarters.
Another thought is that teams play more non-conference games in the first quarter of the season. To test this hypothesis, I could do a comparison of performance prediction on conference vs. non-conference games.
In any case, this is a very puzzling (and hopefully eventually enlightening) result!