## Wednesday, August 10, 2011

### More PMM

In my previous post, I replaced the prediction model from Danny Tarlow's PMM:
Predi = Offensei * Defensej
with this model:
Predi = Offensei + Defensej
with predictably terrible results.   There are other models we could try that would likely be more reasonable, but I want to detour a bit into a two-stage model.

The basic idea is that we predict the game outcome using the original Offense*Defense model, and minimize the error in the prediction across all the games using gradient descent.  However, we then add a second stage, where we attempt to predict the residual error between our best Offense*Defense model and the actual scores.  The value we're going to try to predict is:
Residuali = Score- (Offensei * Defensej)
Now there wouldn't be much sense in trying to predict this Residual by the same sort of Offense*Defense model -- if that would work, it would presumably be captured in our original model.  So we need to pick some different sort of model for the Residual, and in this case we'll use the additive model we used before, except applied this time to predict the Residual:
Pred Residuali = Ri + Sj
and we'll determine R and S by the same sort of gradient descent we use for Offense and Defense.  Our final predicted score will be:

Predi = Offensei * Defensej + Ri + Sj

Here's how that performs:

Predictor    % Correct    MOV Error
PMM71.7%11.23
PMM (w Residual Prediction) 72.0%11.19

The improvement isn't huge, but it does show some promise.  Intuitively, if we think of the performance of a team as a sum of a number of different factors plus some noise, then different models may be capable of accurately modeling different factors.  Some factors may be well-modeled by "Offense*Defense" while others are better modeled by "Offense+Defense".

There are several avenues to explore from here.  One is to look at other alternate models, for both the primary model and the residual model.  Another is to look at combining the two models, so we optimize both at once -- this would have some advantages if the two models are interdependent.  Another interesting notion is to use an entirely different approach -- say, TrueSkill -- for either the primary or the residual model.