I mentioned here that Brady West normalizes all the input data to his model by subtracting the mean and dividing by the standard deviation -- this is called "standard score." Instead of knowing that the home team scored 108 points, you'd know that they score 2.38 standard deviations above the mean. That sounds like a fine approach to me, but as it turns out, RapidMiner (the tool I'm using to do the predictive models) doesn't offer that as an option. It does, however, offer a z-transformation, which transforms the data so that it has a mean of zero and a standard deviation of 1. If we apply that to all of our inputs, we'll have more of an apple-to-apples comparison. For example, the home scoring average ends up ranging from -9.96 to 3.99, while the away team's FT percentage varies from -14.34 to 4.87 -- giving you some sense that there is more variance in FT shooting percentage.
If we apply the z-transformation to our inputs, there is no change in performance for the model that takes only scoring averages. That's reasonable, since the scoring averages are all basically on the same scale anyway. But when we throw in a second data point with a different scale, the difference becomes apparent:
Predictor | % Correct | MOV Error |
---|---|---|
Govan + Averaging | 73.5% | 10.80 |
Scoring averages | 72.1% | 11.18 |
Scoring + 3 pt % -- Without normalization | 72.1% | 11.18 |
Scoring + 3 pt % -- With normalization | 72.1% | 11.09 |
So as a matter of course I'll perform a normalization step as part of the prediction workflow. (In this case, it doesn't improve our best performance by much.)
It's also interesting to compare the coefficients in our linear regression. This is what we see if we look at the coefficients for the various scoring averages:
Datum | Coefficient |
---|---|
Home Team Scoring Average | 5.886 |
Away Team's Opponent Scoring Average | -4.447 |
Away Team Scoring Average | -5.686 |
Home Team's Opponent Scoring Average | 4.793 |
Naively, you might want to predict a team's score as exactly halfway between what the team usually scores (offense) and what the other team usually gives up (defense); but what this shows is that the best estimate actually weights offense slightly more -- 57% for the home team, 54% for the away team.
This comment has been removed by the author.
ReplyDeleteThis comment has been removed by the author.
ReplyDelete