Recall that in TrueSkill we update the ratings for the two teams involved in a game by comparing the strengths. If you win a game over a strong opponent, than that's good evidence that your rating ought to rise and your opponent's fall. And if you win a game over a weak opponent, than that's not good evidence to change the ratings (because you were expected to win).
So how should we interpret MOV? One reasonable approach is to say that a win by a large MOV is better evidence that your rating should rise than a win by a small MOV. (For the moment we ignore "Running Up the Score" and similar problems with MOV.) Referring back to how TrueSkill works, winning by a large MOV is therefore similar to beating a stronger team. So perhaps we can incorporate MOV into the TrueSkill algorithm by adjusting our opponent's rating up or down based upon MOV (creating an "effective" rating) and then updating our own rating accordingly.
That turns out to be pretty straightforward to add to the algorithm, and gives these results:
Predictor | % Correct | MOV Error |
---|---|---|
TrueSkill + iRPI | 72.9% | 11.01 |
Govan (best) | 73.5% | 10.80 |
TrueSkill (w/ MOV) | 73.3% | 10.91 |
This turns out to work surprisingly well for a completely arbitrary hack. Some playing around with the bonusing function shows that performance is slightly improved by using MOV*2 as the bonus:
Predictor | % Correct | MOV Error |
---|---|---|
TrueSkill + iRPI | 72.9% | 11.01 |
Govan (best) | 73.5% | 10.80 |
TrueSkill (w/ MOV) | 73.3% | 10.91 |
TrueSkill (w/ MOV*2) | 73.3% | 10.88 |
Other bonusing variants and tweaks don't show any improvement. This performance is not quite as good as the Govan rating, but certainly shows some promise.
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.