Suppose that Maryland plays Duke in Cameron. Maryland is ahead by 8 points at halftime, but ends up losing by 2 points. The next week, Wake Forest plays Duke in Cameron. They're down 1 point at halftime, and also end up losing by 2 points.
On the basis of those games, which is the better team, Maryland or Wake Forest?
There are reasonable arguments for several viewpoints. Certainly you might argue that there's no reason to think either is better -- they both ended up at the same place and the halftime score is immaterial. You could also argue that Maryland is better -- they outplayed Duke, at least for a half, which is more than Wake Forest managed. Or you could argue that Wake Forest is better -- they were within a small margin of error of beating Duke in both halves, which is more than Maryland can say.
Or consider the case where Maryland wins the first half by 8 and loses the second by 10, while Wake Forest loses the first half by 10 and wins the second half by 8. Is that evidence that either team is better?
In reading papers on rating systems over the past few years, I've noticed that many authors devise a rating system to reflect their personal belief on questions like this one. I wouldn't be surprised at all to read a paper that said (in effect) "Based upon the halftime score, Maryland is clearly the better team, and here's a rating system that reflects that." So we have rating systems that discount blowouts, and rating systems that emphasize non-conference road wins, and so on.
As long-time readers of this blog know, my own outlook is different. What I believe is important or unimportant isn't, well, important. What counts is whether something improves predictive performance. So when I started collecting scoring by halves, my purpose was to see how that could be best used to improve prediction.
The first thing I did was to create some baseline statistics based on the scoring by halves, such as a team's average score in the first half, in the second half, average score of opponents in the first half, average MOV by half, and so on. I didn't expect these statistics to have much predictive value. For one thing, it seems clear that the strength of the opposing team is an important factor in understanding a team's performance, and none of these baseline statistics reflect that strength of schedule.
Still, I believe in testing over assumptions, and testing revealed at least one statistic that did have some predictive value: the ratio of a team's scoring in the first half to the scoring in the second half. As I hinted here, there's some correlation between team strength and the ratio of scoring by halves. Good teams generally have high ratios -- that is, they do more of their scoring in the first half than the average team does. That's a pretty intriguing result. Some work by Monte McNair shows that teams generally improve their offensive efficiency as the game progresses, so it may be that good teams play more efficiently from the start of the game. There's probably more interesting results to be had by analyzing and understanding this result.
After testing the baseline statistics, I turned my attention to using the scoring by halves with strength measures like RPI, Trueskill and so on. These measures try to assign a team a single numeric strength value based upon game outcomes. I wanted to try to extend the measures to include the scoring by halves information and see whether that improved the predictive value of the measures.
There are several ways to go about this, but one straightforward approach is to treat each half like another separate game. So in the case of Maryland above, we'd calculate our measure as if Maryland had played Duke three times -- winning once by 8, losing once by 10, and losing once by 2. We can also try variants, such as using only the first halves of games. So we can calculate all the variants and test to see which one has the best predictive value.
I've initially applied this approach to Trueskill. To begin with, I measured the performance of the baseline Trueskill-MOV metric on my current test set. This is currently the best single predictive measure in the Performance Machine's metrics.
Predictor | % Correct | MOV Error |
---|---|---|
Baseline Trueskill-MOV | 72.7% | 11.59 |
The first tests were to calculate the metric based on just the halves individually, and then using all three results.
Predictor | % Correct | MOV Error |
---|---|---|
Baseline Trueskill-MOV | 73.6% | 11.59 |
First Half Only | 72.1% | 11.97 |
Second Half Only | 70.9% | 12.69 |
All Three | 73.4% | 11.55 |
There are a couple of things to note here. As you might guess, neither half by itself is better than using the game score. More surprising is that performance in the first half is much better for prediction than performance in the second half. (To go back to our second example above, this is reason to believe that Maryland is the better team than Wake Forest.) And using all three together is marginally better (at least in MOV Error) than using just the final score.
So far this treated each half like a separate game. But one could argue that a Margin of Victory of 4 in a half is the equivalent of an MOV of 8 in a whole game. We can test this by applying various modifiers to the scores and how they are bonused in the algorithm. The best results I could find were these:
Predictor | % Correct | MOV Error |
---|---|---|
Baseline Trueskill-MOV | 73.6% | 11.59 |
Best First Half Only | 73.2% | 11.78 |
Best Second Half Only | 72.2% | 12.25 |
All Three | 73.4% | 11.51 |
With tweaking all of the variants could be improved somewhat. Using all three was about a 1/10 of a point improvement on the baseline.
Another possibility is to use the two half scores and ignore the game score. With some tweaking to count the first half about twice as much as the second half, this turns out to be very effective:
Predictor | % Correct | MOV Error |
---|---|---|
Baseline Trueskill-MOV | 73.6% | 11.59 |
Only Half Scores | 74.3% | 11.46 |
I find this a pretty surprising result. Getting a better strength metric by ignoring the game outcomes is non-intuitive (to say the least) and goes against the typical sports punditry about how winning is the only thing that matters.
This metric is the single best metric in the PM's arsenal, and was used to generate the PM's Top Twenty.
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.