Friday, August 19, 2011

The Relative Importance of Possessions

One of the reasons we want to calculate the number of possessions in a game is so that we can calculate "tempo-free" stats such as Points Per Possession (PPP).  By factoring out possessions, we can get better comparisons between teams playing at different paces.  One of the reasons we want to be able to predict the number of possessions in a game is to predict the Margin Of Victory (MOV) and other statistics that depend upon the pace of the game.

For example, suppose that Duke is playing Maryland, and Duke's predicted Points Per Possession (PPP) for this game is PPPDuke, and Maryland's predicted Points Per Possession (PPP) for this game is PPPMaryland. If we know the number of possessions that will occur in the game, we can then predict the MOV as:
MOVpredicted = Posspredicted * (PPPDuke - PPPMaryland)
One of the appealing features of this predictor is that it is fairly orthogonal to predicting MOV from won-loss records or even previous MOV; so this predictor (if any good) would likely be a good candidate for an ensemble predictor with methods like TrueSkill or Govan.

We've already seen that we don't (yet) have a very good method for predicting the number of possessions in a game.  But we don't know how important Posspredicted is in that equation above; it could be that we'd do fine with a fairly poor predictor and shouldn't waste too much time trying to improve.  So how can we estimate the importance of Posspredicted?

One approach is to bound the importance by assuming that we can't predict possessions at all, and see how well we can do predicting MOV based upon only the PPP for the two teams.  If we take the actual PPPs for games (as if we had a perfect predictor for PPP) and stick them into a linear regression to predict MOV, we get this performance:

Predictor    Error    % Correct
Perfect PPP information only 1.3891%

Which is an amazingly good result.  Without any notion of the pace of the game, we can still predict the MOV within ~1.5 points if we know the relative offensive efficiencies of the two teams.

(If you're wondering why we only get 91% of the games correct, it's because the regression optimizes for MOV, and it turns out the MOV prediction is better when the home team is slightly overweighted.)

If we throw our best possessions predictor into the model, the performance improves by about 20%:

Predictor    Error    % Correct
Perfect PPP information only 1.3891%
Perfect PPP information + "Best" possessions predictor  1.0791%

This suggests that it's far more important to accurately predict PPP, and that even our current fairly poor possessions predictor may be good enough.