The Predictive Analytics Challenge (as well as countless office pools) takes as a measure accuracy in predicting the NCAA tournament. That makes for an interesting challenge (and interesting office discussions) but has a few problems as a metric. First, the sample size for testing is rather small -- only 63 (or so) games a year. Second, picking all the games before any have been played and scoring different rounds with different values introduces a host of strategic complications. Finally, unlike 95% of the college basketball games, the tournament is played on a neutral court.
For these reasons, I prefer to measure the predictor against individual regular season games. Obviously I'll also use it to try to predict the Tournament -- I just won't measure its performance against Tournament games.
So how should we measure the performance of our predictor? The obvious (and simplest) measure is whether it predicts the correct outcome. That's a good metric, but it does have some flaws. For one thing, predicting the correct outcome of many games is trivial. When Duke plays Wake Forest, it isn't too difficult to predict with some confidence that Duke will win. Secondly, it's really only useful for entering Tournament contests.
A second measure we can use is to try to predict the Margin of Victory (MOV) and measure how close we got. This measure makes predicting the Duke-Wake Forest matchup more interesting -- Duke is very likely to win, but by how much? It's also useful if we want to match our predictor against the Las Vegas bookmakers, who release a "line" on every game that represents their best prediction for Margin of Victory. Given the strong financial motivation the bookmakers have to be good predictors, they should be a good test of our predictor.
(Strictly speaking, the bookmakers may not set the line to their best prediction of MOV. They may set or move the line to equalize betting on both sides of the game to minimize their financial risk.)
I will use both measures to assess the performance of the predictor. I've assessed a number of prediction models with both metrics, and it's almost always the case that optimizing one measure tends to optimize the other. In some cases that may not be true, and I'll rather arbitrarily weigh the trade-off and pick one over the other.
Now that we've established our metrics of performance, let's think about how good our predictor can be. Actually, let's start off by thinking about how bad our predictor can be.
If we know absolutely nothing about a game and randomly choose one of the two teams to win, we will predict the correct team 50% of the time. And, as it happens, the average MOV is about 15 points. So that sets a lower bound on prediction:
|Predictor||% Correct||MOV Error|
(If you have a predictor that does worse than that, take the opposite of it's predictions and you'll have a better predictor :-)
Interestingly, with a tiny bit more information (and I mean that literally), we can do much better. That's a topic for the next posting.