Thursday, June 23, 2011

Another Experiment

I don't recall now what prompted me, but I recently decided to take a look at how the performance of the predictors works temporally.  The same way I broke the games down by mismatches as I did in this post, I would subdivide the games according to when they occurred in the season: early, middle, or late.

I have about 4000 games in the training data for each season (2009, 2010, 2011) so I actually broke those down into quarters to give me an even 1000 games in each portion of the season.  (I remind the reader that I remove the first 1000 games of the season from the training set because the ratings aren't yet reliable.  So when I talk about the first 1000 games in this posting, those are actually the second thousand games played in that season.)  I then tested the performance of the two best predictors: TrueSkill + mov adjustment, and the Govan ratings.  In this case, I think it's easier to look at a graph of the results rather than a table.  Let's start with the MOV Error by quarters of the season:

MOV Error By Quarter

As you might expect, the error goes down (performance improves) as the season progresses.  On average, the improvement is slightly more than 1 point.  (In 2009, both ratings perform slightly worse in the 4th quarter of the season, but the overall trend seems clear.)  Presumably more games leads to a more accurate rating and hence a more accurate prediction.

Now let's look at a similar chart for the percentage of correct predictions:

% Correct Predictions by Quarter
This chart is a complete curveball.  Performance at correctly picking the winner is by far best in the first quarter, drops dramatically through the third quarter, and then rebounds substantially in the fourth quarter.  This is completely unexpected, particularly in light of the MOV Error, which performs as we'd intuitively expect.

I can hazard some guesses as to what is going on, but none seem particularly explanatory.  More experiments would seem to be in order.

One question is whether the prediction performance is high in the first quarter because (1) the rating is only operating on a few games, or (2) the teams are playing differently (more predictably) in that quarter.  To test that hypothesis, I could restart the rating at the beginning of each quarter.  If (1) is true, then we'd expect to see similar performance in all four quarters.

Another thought is that teams play more non-conference games in the first quarter of the season.  To test this hypothesis, I could do a comparison of performance prediction on conference vs. non-conference games.

In any case, this is a very puzzling (and hopefully eventually enlightening) result!

2 comments:

  1. It is almost certainly that there are many more mismatches early in the season before teams start playing the more evenly-matched conference schedules.

    ReplyDelete
  2. Perhaps roster and starter lineup due to injuries shift the skill of the team enough to bias the model?

    ReplyDelete