## Tuesday, August 16, 2011

### Possessions, Continued

A few more experiments and thoughts about predicting the number of possessions in a game.

Following up on my previous post, I took a look at some variants of the model used there to predict possessions.  To start with, I looked at a variant model where one team had more control over the pace of the game:
Possessionspredicted = Alpha * Preferred PossHome + (1 - Alpha) * Preferred PossAway
The idea being that perhaps the home team has more control over the pace of the game -- analogous to the home court scoring advantage.   Or perhaps the away team has the advantage. However, the results didn't indicate an advantage for either team.  Prediction performance went down for any value of Alpha significantly different from 0.50.

The next experiment was to use a different model altogether for predicted possessions:
Possessionspredicted = FHome * FAway
There's no intuitive explanation for this model -- it just presumes that there's a multiplicative relationship between two underlying factors.  But it's quite different from our intuitive model (that the two teams essentially split the difference between their desired paces), so at a minimum, if this model worked poorly, it would be evidence that our intuitive model has some validity.

But interestingly enough, this model did just about as well as the split model:

Predictor    Error
Possessions (Split Model)5.20
Possessions (Multiplicative Model) 5.22

which suggests to me that the intuitive approach may not be particularly valid.

So I took a step back and tried to characterize the range of performance for predicting possessions.  To start with, I created a predictor that simply predicts the average for the test data (~67):

Predictor    Error
Possessions (67)6.30
Possessions (Split Model)5.20
Possessions (Multiplicative Model) 5.22

This showed that the split model is only about 1 possession per game more accurate than just guessing the average.  As a second experiment, I ran a linear regression using all the teams as the attributes -- this generates a huge regression with 592 terms (essentially a Home term and an Away term for every team in Division I) with the following performance:

Predictor    Error
Possessions (67)6.30
Possessions (Split Model)5.20
Possessions (Multiplicative Model) 5.22
Possessions (Regression on All Teams) 4.72

I wouldn't expect much out of this model, but it does about a 1/2 a possession better than the Split Model.  (It should be noted that this is not an apples-to-apples comparison to the other models; this simple regression uses the entire season for data, not just the season up to the predicted game.)  So I think there's clearly some work to be done in improving our model for predicting possessions.