"...a necessary and sufficient condition for an ensemble of classifiers to be more accurate than any of its individual members is if the classifiers are accurate and diverse."A classifier is accurate if is better than random guessing. Two predictors are diverse if they make different errors. Intuitively, an ensemble will perform better than the base predictors if the errors in the base predictors are uncorrelated and tend to cancel each other out. Our predictors are all obviously accurate, but are they diverse?
To test this we can measure the correlation between the errors made by the different predictors. If they are uncorrelated, then it is likely that we can construct an ensemble with improved performance. I don't have the time and energy to test all combinations of the predictors I've implemented, but here are the correlations between the top two won-loss based predictors (Wilson, iRPI) and the top two MOV-based predictors (TrueSkill+MOV, Govan):
Wilson | iRPI | TrueSkill + MOV | |
---|---|---|---|
iRPI | 0.99 | ||
Trueskill+MOV | 0.93 | 0.93 | |
Govan | 0.95 | 0.95 | 0.98 |
Not unsurprisingly, the highest correlations are between the two won-loss predictors and the two MOV-based predictors. But all of the predictors are highly correlated. The least correlated (by a hair) are Wilson and TrueSkill+MOV. Putting those two predictors into a combined linear regression or an averaging ensemble results in performance worse that TrueSkill+MOV alone.
On the other hand, perhaps using the best predictors is the wrong course. Perhaps its more likely that the worst predictors are uncorrelated with the best predictors, and a combination of one of the worst with one of the best would be fruitful.
Wilson | iRPI | TrueSkill + MOV | Govan | |
---|---|---|---|---|
1-Bit | 0.83 | 0.83 | 0.80 | 0.79 |
Winning Percentage | 0.97 | 0.98 | 0.92 | 0.93 |
As this shows, even the 1-Bit predictor ("the home team wins by 4.5") is highly correlated with the better predictors, and using just the winning percentage shoots the correlation to 0.92+. Adding these predictors to an ensemble with the better predictors also results in worse performance.
Of course, it's always possible that some combination of predictors will improve performance. There's been some interesting work in this area -- see (Caruana 2004) in Papers. But for right now I don't have the infrastructure to search all the possible combinations.
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.