Monday, May 2, 2011

Iterative Strength Rating (ISR)

(Note:  My original implementation of the ISR had an error.  See "Whoops!"

Having found some success with Trueskill, the second RPI alternative we'll look at is called "Iterative Strength Rating."  This particular rating was developed by Boyd Nation for rating college baseball teams.  Nation describes the ISR algorithm thus:
Begin with all teams set to an even rating -- 100 in this case. Then, for each game played, give each team the value of their opponent's rating plus or minus a factor for winning or losing the game -- 25 in this case. Total all of a team's results, divide by the number of games played, and that's the end of a cycle. Then use those numbers as the start of the next cycle until you get the same results for each team for two consecutive cycles.
So the ISR is an iterative algorithm similar to what we used for the "infinitely deep" RPI.  Implementing this and testing with our standard cross-validation methodology produces these results:

Predictor    % Correct    MOV Error
1-Bit62.6%14.17
Trueskill (draw=8 points)72.8%11.09
ISR77.7%10.45

This algorithm substantially outperforms our previous best algorithm (Trueskill).  It gets 5% more games correct and pushes the MOV error down below 11 points.  That's very impressive performance from a very simple approach.

There are limited opportunities for tweaking ISR.  As with our other models, we're using a linear regression, so the HCA is efficiently modeled within the regression.  The only tweak that obviously applies is to filter out some of the training data with an MOV cutoff, as we did with success for both RPI and Trueskill:

Predictor    % Correct    MOV Error
1-Bit62.6%14.17
Trueskill (draw=8 points)72.8%11.09
ISR77.7%10.45
ISR (mov=1)77.1%10.48
ISR (mov=4)76.2%10.49

Unfortunately, for ISR eliminating close games from the training set does not improve the predictive performance.  However, we've set a new highwater mark for performance from W-L only ratings.  If ISR remains the champion after look at other algorithms, we'll put more effort into seeing if the performance can be further improved.