Monday, April 7, 2014

Championship Game Prediction

The Prediction Machine hasn't fared very well this Tournament (languishing in the middle of both the Kaggle and March Machine Madness contests) but for what it's worth here is the prediction for the Championship Game:
Connecticut vs. Kentucky:  Kentucky by 2
I'd like to see Connecticut win myself, but I think they have a hard row to hoe.  Napier & Boatright have been destroying opposing guards with their pressure defense.  If they can do that to the Harrison twins and keep them from repeatedly driving the lane, that will certainly help Connecticut's chances.  But so far the referees have been very stingy with charge calls, which is going to be make it very difficult for Connecticut's undersized defense to deal with Kentucky's dribble-drive offense.  Wisconsin figured out in the second half that they could mug the Harrisons once they were in the lane with little repercussion, but who knows if the reffing crew tonight will allow that.  And you have to figure that Kentucky is going to continue to enjoy an enormous advantage in rebounding.  Still, anything can happen, and it will hopefully be a tight and entertaining game.

Machine March Madness Winner: Congratulations to Monte McNair!

Apparently none of the competitors in the Machine March Madness have either Kentucky or Connecticut winning the final game, so the contest has been decided, and the winner is Monte McNair with 108 points and 40 correct picks.

(Note that we did have one Machine March Madness competitor who did better than Monte -- "TD" -- but since he never contacted me to explain his entry, he has been disqualified.)
Congratulations to Monte who continues to be one of the strongest competitors year after year.  (Although unfortunately something went wrong for him in the semi-final games in the Kaggle contest, where he dropped from the top ten to 44!)

Wednesday, April 2, 2014

Recent Papers Reviewed

I have added several new papers to the Papers archive.  Short descriptions follow.

[Barrow 2013] D. Barrow, I. Drayer, P. Elliott, G. Gaut, and B. Osting, "Ranking rankings: an empirical comparison of the predictive power of sports ranking methods," 2013.

This paper compares a number of ranking systems on predictive power.  The main conclusions are that (1) ranking systems which use margin of victory are more predictive than those that use only win-loss data, and (2) least squares and random walkers are better than other methods for predicting NCAA football outcomes.
[Hvattum 2010] Lars Magnus Hvattum, , Halvard Arntzen, "Using ELO ratings for match result prediction in association football," International Journal of Forecasting 26 (2010) 460–470.
This paper looks at using ELO ratings to predict association football (soccer) matches.  ELO was better than all of the other rating systems, but failed to out-perform the market lines.
[Kain 2011] Kyle J. Kain and Trevon D. Logan, "Are Sports Betting Markets Prediction Markets?  Evidence from a New Test," January 2011.
This paper tests whether the point spread is a good predictor of margin of victory (it is) and whether the over/under is a good predictor of total points scored (it is not).
[Melo 2012] Pedro O. S. Vaz De Melo, Virgilio A. F. Almeida, Antonio A. F. Loureiro, and Christos Faloutsos, "Forecasting in the NBA and Other Team Sports: Network Effects in Action," ACM Transactions on Knowledge Discovery from Data, Vol. 6, No. 3, Article 13, October 2012.
This is a rather interesting paper that models NBA teams as networks exchanging players and coaches.  This allows the authors to look at hypotheses such as "trading players improves a team's performance," or "a player who has played for a number of teams is more valuable than one who hasn't."  They develop metrics such as "team volatility" and use these to predict future performance.
[Page 2007] Garritt L. Page, Gilbert W. Fellingham, C. Shane Reese, "Using Box-Scores to Determine a Position’s Contribution to Winning Basketball Games," Journal of Quantitative Analysis in Sports, Volume 3, Issue 4 2007 Article 1.
This paper looks at box scores for games from the 1996-97 NBA season to determine the importance of different basketball skills (e.g., defensive rebounding) were to each basketball position (e.g., point guard).  The surprising result was the importance of defensive rebounding by the guard positions and offensive rebounding by the point guard.
[Park 2005] Juyong Park and M. E. J. Newman, "A network-based ranking system for US college football," Department of Physics and Center for the Study of Complex Systems, University of Michigan, Ann Arbor, MI, 2005.
The authors develop a ranking system based upon the intuitive logic that "If A beat B and B beat C, then A indirectly beat C" and apply it to college football.
[Strumbelj 2012] Erik Štrumbelj, Petar Vračar, "Simulating a basketball match with a homogeneous Markov model and forecasting the outcome," International Journal of Forecasting 28 (2012) 532–542.
The authors build a possession-by-possession transition matrix for an NBA game based upon box score data and team statistics.  They then use this matrix to predict game outcomes.  The results were not statistically better than methods such as ELO, and worse than point spreads.

Monday, March 31, 2014

Final Four Predictions

The Prediction Machine did pretty well on the Sweet Sixteen games.  I think it would have missed many of the Elite Eight games, but I didn't actually run it so we'll never know.  For the first two games of the Final Four:

#1 Florida vs. #7 Connecticut:  Florida by 5.5

I think most people would agree that Connecticut is the weakest of the Final Four teams.  Florida meanwhile has been rolling along quietly taking care of business.  Short of an abnormal shooting night from one or both of the teams, I don't think UConn has much chance in this game.

#2 Wisconsin vs. #8 Kentucky:  Toss-up
Before watching the Kentucky-Michigan game, I thought Wisconsin was playing the best basketball of any of the contenders.  Now I'm not so sure.  Kentucky has been nearly unstoppable on offense throughout the Tournament, and the fabled freshmen have been impervious to the pressure.  Still, the Wildcats may be vulnerable if they get stymied enough on offense (as they did a couple of times this year against Florida), and Bo Ryan's team is certainly capable of applying the defensive pressure.  But even so, Wisconsin is going to have to be very efficient on the offensive end to stay even with the Wildcats.

Wednesday, March 26, 2014

Adventures in Data Cleansing

According to the ESPN play-by-play data, the American University vs. Penn State game on 12/21/2009 was a blowout -- Penn State won 914-629.

Imagine if it had gone to OT!

Machine Madness Competitors: Monte McNair

Next up in our tour of Machine Madness competitors is Monte McNair.  Monte was in this contest last year as well, under the nom de plume "Predict the Madness". 

Monte attended Princeton but is also a lifelong Stanford fan, so he is enjoying their current Tournament run.  As a UCLA fan I'll try not to hold that against him.  At least he isn't a Cal fan.  Monte blogs (infrequently) about sports at Outside the Hashes.  He also runs a site called Ultimate Bracket Challenge that's worth checking out and bookmarking for next year.

Last year he did a posting over on the Number Crunching Life where he talked about his approach.  He uses a logistic regression based upon the location of the game, metrics for the team's offense and defense, and metrics of the team's opponents' averages for both offense and defense.  Unlike some approaches (like mine) that produce a predicted point spread, Monte's approach produces a confidence number.  Monte finished in the middle of the pack last year but is doing much better this year.  He's currently in second in this contest, and is doing quite well over on Kaggle, where he's currently in eleventh.

Monte has Villanova-Florida-Arizona-Louisville as his Final Four, with Arizona winning it all.  The current leader has Florida for champion, so if Arizona wins it all Monte will likely jump into first and win this contest.

Sweet Sixteen Analysis

Jeff Fogle over on Stats Intelligence has a nice post up analyzing the Sweet Sixteen matchups.  Unlike most analysis you'll see, this is actually grounded in the team statistics instead of some pundits vague intuitions.

Unfortunately for me, Jeff comes to the same conclusion I did about UCLA's chances against Florida:  not very good.  UCLA did beat Arizona (a team very similar to Florida) in the final of the Pac-12 Tournament, but Arizona was a little tired for that game, and UCLA enjoyed a tremendous advantage on the free throw line.  You never know what the officiating will be like in the Tournament, but I'll be very surprised if UCLA ends up with a significant advantage in that category.