Tuesday, September 18, 2012

Awakening From the Long Summer's Sleep

College basketball fans hibernate in the summer.

I'm slowly awakening from my March Madness-induced stupor and starting to prepare for the new season.

One of the first tasks is to look at conference realignments.  My predictors don't actually use conferences for anything -- I keep thinking that conference games will have more predictive power than non-conference games, or vice versa, but to date neither has proven to be true.  Nonetheless, I keep track of the conference affiliations of teams, so every Fall I have to update that data for the various conference movements.

I took my summary of the changes from the "Blogging the Bracket" here.  If there's any interest in the compiled data, please let me know.  I've noticed that there's been little interest in the data files I provided last year, so I won't bother unless someone expresses some interest.

The next task is to scrape the schedule of games for the season.  In past seasons, I've scraped the schedule from Yahoo Sports.  Unfortunately, it appears that they have "updated" their interface and broken everything.  No scheduled games appear at all, and the majority of the tournament games from last year are missing as well.


Hopefully this is just a temporary situation while Yahoo Sports gets their bugs fixed and the data loaded.  Alternate sources of this data are not easy to find.  ESPN and CBS are still showing last year's games.  The NCAA website started carrying game results (and box scores) last season, but doesn't seem to have the upcoming games.

In the meantime, I've been thinking about how to predict early season games.  These games are difficult to predict because we do not have any history of past performance for this year's teams.  So we're forced to base our predictions on other data -- or to not predict early season games (which is what I've done in past seasons).  Some alternate data is only available for some of the teams (e.g., the AP preseason rankings) or is entirely subjective, which makes it less useful from my viewpoint.

One source of objective data for all the teams is their previous season's performance.  One approach to predicting the early season games is to assume that teams will be just as strong this year as they were last year.  Another approach might be to assume that teams will migrate towards the mean -- the best teams from last year will get a little weaker and the worst teams will get a little stronger.  We could also look at team data such as the number of graduating seniors and use that information to modify the previous year's performance -- e.g., a team that lost most of its starting minutes would get weaker.  An intriguing idea is to see if we can predict the change in performance for a team from season to season (based upon what factors?) and then use that to modify the previous year's performance.

As time permits, I will set up to test some of these ideas and report my findings.