## Friday, April 15, 2011

In the previous post, we started looking at RPI and found that it was a considerably better predictor than the 1-Bit Predictor. However, RPI has several obvious shortcomings.  Will fixing these improve its performance as a predictor?  Let's see!

The first area we can look at improving is accounting for Home Court Advantage.  Recall that previously we showed HCA to give the home team about a 4.5 point advantage, or overall a +30% chance of winning.  In 1981, the NCAA added a correction to the RPI formula to account for this advantage.  The correction weights a team's wins and losses differently depending upon where they were played.  A home win is only worth 0.6 "wins", while a road win is worth 1.4 "wins".  Conversely, a home loss costs 1.4 "losses", while a road loss is only 0.6 "losses".

There are a couple of potential problems with this approach.  First, the RPI formula applies this weighting only to the winning percentage calculation of the team being rated.  It is not used in calculating the opponents' winning percentage (OWP) or the opponents' opponents' winning percentage (OOWP).  So the OWP and OOWP are potentially biased by the HCA.  Second, the weighting chosen (0.6/1.4) doesn't appear to reflect the actual home court advantage, which is closer to 30% than 40%.

Let's see if changing the weightings in the calculation of WP to 0.7/1.3 (closer to the HCA I measured) results in any improvement. Making this change and testing gives this result:

Predictor    % Correct    MOV Error
1-Bit62.6%14.17
RPI73.2%11.62
RPI (1.7/0.3)73.4%11.58

A very slight improvement.  This is not too surprising -- this change only has a small effect on Winning Percentage, which is only 25% of a team's RPI.  Perhaps the NCAA's approach to HCA doesn't have much impact at all?

I should have learned my lesson last time, let's pause a moment to run a test to make sure that the HCA really is a problem.  To do this, we'll run a quick experiment using no weighting (e.g., 1.0/1.0) to see how much improvement this approach to HCA is actually providing.  Performance with no weighting gives these results:

Predictor    % Correct    MOV Error
1-Bit62.6%14.17
RPI73.2%11.62
RPI (1.7/0.3)73.4%11.58
RPI (1.0/1.0)74.6%11.53

Surprise!  The NCAA's correction for the home court advantage seems to have actually made the RPI's performance worse.  Hence the value of testing everything -- sometimes intuitively correct notions turn out to be incorrect. Further experimenting with a variety of weightings confirms that the unweighted RPI actually performs better than any weighted variety.

A different approach to accounting for HCA is to adjust game outcomes using the HCA in points.  That is, we'll subtract the HCA (say, 4.5 points) from the home team before determining who "won" the game.  So when Duke wins by 3 at home against North Carolina, that game will count as a win for North Carolina when calculating RPI.  (And we'll carry this through all levels of the RPI calculation to avoid that possible shortcoming.)  This is the equivalent of moving every game to a neutral site (but cheaper).

That change provides this performance:

Predictor    % Correct    MOV Error
1-Bit62.6%14.17
RPI (unweighted)74.6%11.53
RPI (HCA=4.5)73.6%11.61

This is not an improvement over RPI with no weighting.  Experiments with other values for the HCA also do not improve performance (HCA=3.5 does the best, though).  So it appears that the RPI does not benefit from adjustments to eliminate the HCA -- a somewhat surprising result!

My vague intuition about this result is that the HCA is essentially "washed out" of the RPI because the majority of teams play home-and-home series within their conferences.  So any home advantage is gained equally by every team, and any attempt to compensate within the RPI formula just adds error.

We'll return in a bit to an alternative approach for HCA suggested by Dick Vitale.  But since HCA doesn't appear to be a significant problem, first we'll detour into a couple of other possible improvements to RPI.