Net Prophet: possessions

Showing posts with label possessions. Show all posts

Friday, August 19, 2011

The Relative Importance of Possessions

One of the reasons we want to calculate the number of possessions in a game is so that we can calculate "tempo-free" stats such as Points Per Possession (PPP). By factoring out possessions, we can get better comparisons between teams playing at different paces. One of the reasons we want to be able to predict the number of possessions in a game is to predict the Margin Of Victory (MOV) and other statistics that depend upon the pace of the game.

For example, suppose that Duke is playing Maryland, and Duke's predicted Points Per Possession (PPP) for this game is PPP_Duke, and Maryland's predicted Points Per Possession (PPP) for this game is PPP_Maryland. If we know the number of possessions that will occur in the game, we can then predict the MOV as:

MOV_predicted = Poss_predicted * (PPP_Duke - PPP_Maryland)

One of the appealing features of this predictor is that it is fairly orthogonal to predicting MOV from won-loss records or even previous MOV; so this predictor (if any good) would likely be a good candidate for an ensemble predictor with methods like TrueSkill or Govan.

We've already seen that we don't (yet) have a very good method for predicting the number of possessions in a game. But we don't know how important Poss_predicted is in that equation above; it could be that we'd do fine with a fairly poor predictor and shouldn't waste too much time trying to improve. So how can we estimate the importance of Poss_predicted?

One approach is to bound the importance by assuming that we can't predict possessions at all, and see how well we can do predicting MOV based upon only the PPP for the two teams. If we take the actual PPPs for games (as if we had a perfect predictor for PPP) and stick them into a linear regression to predict MOV, we get this performance:

Predictor	Error	% Correct
Perfect PPP information only	1.38	91%

Which is an amazingly good result. Without any notion of the pace of the game, we can still predict the MOV within ~1.5 points if we know the relative offensive efficiencies of the two teams.

(If you're wondering why we only get 91% of the games correct, it's because the regression optimizes for MOV, and it turns out the MOV prediction is better when the home team is slightly overweighted.)

If we throw our best possessions predictor into the model, the performance improves by about 20%:

Predictor	Error	% Correct
Perfect PPP information only	1.38	91%
Perfect PPP information + "Best" possessions predictor	1.07	91%

This suggests that it's far more important to accurately predict PPP, and that even our current fairly poor possessions predictor may be good enough.

Wednesday, August 17, 2011

Possessions, Part 3

In a comment on the last posting, ProbablePicks suggested trying to predict the number of possessions in a game by a regression on the average number of possessions for both teams in their previous games. That was an excuse to add the ability to create averaged stats to my processing framework, so I put that in, debugged it for a while and then created the suggested regression. Here was its performance:

Predictor	Error
Possessions (67)	6.30
Possessions (Split Model)	5.20
Possessions (Regression on Averages)	5.10

It does a little better than the Split Model. The regression equation looks like this:

Poss = 0.665*HPoss_ave + 0.620*APoss_ave - 20.540

This weights the home team slightly more (about 7%) than the away team -- I speculated on this possibility with the Split Model but didn't see a performance improvement in that case.

One speculation I've had is that possessions might be harder to predict in close games. There will usually be more fouling and aggressive defense that might create turnovers and additional possessions. We can (sort of) look at that by filtering out games where the MOV was above some cutoff:

Predictor	Error
Possessions (Regression on Averages)	5.10
Possessions (MOV > 8)	4.97
Possessions (MOV > 12)	4.77
Possessions (MOV < -8)	5.17
Possessions (MOV < -12)	5.19

There's an interesting result here: We can do a better job predicting possessions when the home team is winning a blowout, but we do worse predicting possessions when the away team is winning a blowout. I'd be inclined to think this was because games with MOV > 12 (say) are going to skew to the top end of the possessions range anyway, and the compression of the range of possible results will reduce the error. But that's contradicted by the results for away team blowouts, so there's presumably some other explanation. Of course, for predictive purposes this doesn't matter because we won't know the MOV anyway, but it's an intriguing result nonetheless.

Monday, August 15, 2011

More on Possessions

In a previous post, I mentioned that I was working on calculating the number of possessions in a game. There's no direct stat for this, so it has to be approximated by formula from the existing stats. The number of possessions in a college game averages about 67. (In the previous post I reported the average as 68, but that number was skewed by overtime games. I've since fixed that problem by normalizing to 40 minutes rather than per game.)

The next step is to try to predict the number of possessions when two teams play each other. Why would we want to do this? Consider the situation where Maryland beats Duke by 5 points. How strong is that evidence that Maryland is better than Duke? Well, if the final score was 45-40 we might consider that stronger evidence than if the score was 125-120. Number of possessions is also used to calculate "tempo-free statistics" which allow us to better make apples-to-apples comparisons between games that are played at different paces.

So how do we predict the number of possessions? The model I'm using right now supposes that each team has a preferred pace -- i.e., an ideal number of possessions. Some teams would like to play games with lots of running up and down the court and many possessions; others would like to play a very slow, controlled game. When two teams meet up, they each try to play the game at their preferred pace, and as a result the game is played somewhere in-between:

Poss_pred = (Preferred Poss_Home + Preferred Poss_Away)/2

Of course, we don't know the "preferred pace" for a team, so we have to try to discover that from the game data. One way to do that is gradient descent, as was used for Danny Tarlow's PMM. If we do that, and then test the predictor in the same way we've tested the others, we get this performance:

Predictor	Error
Possessions	5.20

Is that good performance? If we look at the distribution of possessions/game:

I'm inclined to say the predictor is "meh" -- not particularly good, but probably good enough to be useful.