Methodology
Ratings are tested on data collected from the 2009, 2010 and 2011 Division 1 NCAA Basketball seasons. A Common Lisp program processes each game, generates the ratings/statistics for both teams coming into that game, and dumps that out to a data file. The first thousand games of the season are not used (because the statistics/ratings don't have enough data to be reliable) and the last 150 games are also dropped (to avoid training on post-season neutral court tournament games). These leaves about 12K games in the corpus.
RapidMiner is then used to perform a cross-validation. The full data set is (randomly but repeatably) split 99/01, and the larger portion is used to train a model (usually linear regression) based up on the ratings (and/or other statistics) of the two teams and the game outcome. Then the model is tested on the smaller portion of the data set. This cross-validation is repeated 100 times, and the averaged results are the performance measures reported in the blog.
Data
Basketball data is scraped from Yahoo Sports! An archive of game data as compiled by Danny Tarlow and Lee-Ming Zen can be found here.
Tools
All of the tools used for this effort are open-source and/or freely available.
Basketball data was scraped using Web Harvest. The Tarlow/Zen archive was scraped using the code posted here.
Data processing and calculations (e.g., RPI) are done in Common Lisp. I use Steel Bank Common Lisp, specifically the Windows port maintained by Anton Kovelenko. Development is done in Emacs using Slime.
Data mining and prediction models are run in RapidMiner.
Some post-processing of results, etc., is done in OpenOffice Calc.
Code
The code for the predictor is not generally available, however I do make some sample code available: