Sokol and Kvam have written several papers describing the LRMC model (one is available here). The basic notion is similar to the Random Walkers model. Each team has a certain number of votes, and in each iteration we move some of those votes to other teams, based upon that team's past performance. In the Random Walkers model, we move votes based upon whether a team won or lost a game. In LRMC, we move votes based upon the margin of victory.
In [Sokol 2006], the authors derive the following function to estimate the probability that Team A will beat Team B on a neutral site given that A beat B by "x" points on A’s court:
(The numeric factors in this equation were derived from analyzing home vs. home matchups using a logistic regression -- the details appear in the paper.)RH(x) = exp(0.292x-0.6228) / (1 + exp(0.292x-0.6228)
If we realize that "Team A will beat Team B on a neutral site" means the same thing as "Team A is better than Team B", then RH(x) gives us the probability that A is really better than B. We then use this probability to move "votes" between the two teams.
Of course, few NCAA basketball games take place on a neutral court, so we have to adjust our calculation to account for the HCA. [Sokol 2006] calculates the HCA at 10.5 points (a large value not in line with other analysts; we'll return to this in a moment), so we have to take away the HCA when calculating the RH(x) for the home team. If A beat B by 15 points at home, then RH(15-10.5) = 0.530, and A gets 53% of the "votes" that ride on this game.
If we plug RH(x) into our Random Walkers model, we get this performance:
Predictor | % Correct | MOV Error |
---|---|---|
TrueSkill + iRPI | 72.9% | 11.01 |
LRMC [2006] | 71.3% | 11.65 |
This performance in on par with standard RPI. One concern with this approach is that even if the home team wins by 40 points, it can only garner about 70% of the "votes" because the exponential function tails off very slowly. Most college basketball fans would probably consider a win by 40 points near-certain proof that Team A was better than Team B. So rather than give the away team a floor of 30%, we can split the remaining 30%, or even assign it all to the home team. These approaches produce this performance:
Predictor | % Correct | MOV Error |
---|---|---|
TrueSkill + iRPI | 72.9% | 11.01 |
LRMC [2006] | 71.3% | 11.65 |
LRMC [2006] +15% to home | 70.5% | 11.62 |
LRMC [2006] +30% to home | 66.8% | 12.57 |
Neither of these proves to be an improvement.
[Sokol 2010] experimented with replacing the "RH" function derived by logistic regression with other models, and found that an empirical Bayes model was better. (Technically, that makes the name LRMC no longer appropriate.) Part of the motivation for this change was that the 10.5 point home advantage found in the logistic regression model was considerably different than the estimates of HCA by everyone else. With the empirical Bayes model, the HCA is determined to be in the range 2-4, in line with other estimates. The RH function for the new model is:
RH(x) = phi(0.0189x-0.0756)Plugging this into our Random Walkers model gives this performance:
Predictor | % Correct | MOV Error |
---|---|---|
TrueSkill + iRPI | 72.9% | 11.01 |
LRMC [2006] | 71.3% | 11.65 |
LRMC [2010] | 71.8% | 11.40 |
This does prove to be an advantage over the original RH(x) function, but still not competitive with our best non-MOV predictor.
This comment has been removed by a blog administrator.
ReplyDeletenice presentation with information. thanks for share...
ReplyDeleteSuain Logistic
Nice artical dear Thanks for sharing...........
ReplyDeleteTop Logistics Companies