I've been a little slow in getting around to this, but I want to congratulate "SDSU Fan" on winning the 2016 Machine Madness contest! In real life, SDSU Fan is Peter Calhoun, a graduate student in Statistics at (no surprise) San Diego State University. We had a very large pool of entrants this year (40!) so Peter deserves some congratulations for beating the masses. Peter was trailing by a significant amount after the Round of 32, but strong performances in the later rounds (and especially the FF) resulted in big lead by the end.

Peter's model modified the Logistic Regression/Markov Chain (LRMC) approach proposed by Kvam and Sokol to use random forests. Peter also finished in fiftieth on Kaggle -- a very strong performance all around.

Despite the large number of entries, nobody had Villanova winning it all. I think that makes the Villanova win a "true upset". I know in my model, Villanova played considerably better than predicted.

Speaking of my model, it follows a strategy in pool-based contests of picking some "likely" upsets to try to maximize the chance of winning. (This is probably more important in a larger pool.) This year, it picked Purdue to make it to the Championship Game. Not only didn't that happen, Purdue was upset in the first round by #12 Little Rock. I'm adding a special "Purdue Rule" to the Net Prophet model so that mistake is never again repeated. :-)

Congratulations again to Peter on great performance!

# Net Prophet

Exploring algorithms for predicting NCAA basketball games.

## Friday, April 8, 2016

### Paper Reviews

These papers have been added to the paper archive available through the Papers link on the sidebar. Links are also provided for direct download of the papers.

*Dubbs, Alexander, "Statistics-Free Sports Prediction", arXiv.org*The author builds logistic regression models for MLB, NBA, NFL, and NHL games that use only the teams and scores. This works best for basketball, and the author concludes that "in basketball, most statistics are subsumed by the scores of the games, whereas in baseball, football, and hockey, further study of game and player statistics is necessary to predict games as well as can be done."COMMENT: I'm not sure the results of this paper say anything deeper than "Compared to the other major sports, NBA has a long season and the teams don't change much from year to year."

*Clay, Daniel, "Geospatial Determinants of Game Outcomes in NCAA Men’s Basketball," International journal of sport and society 02/2015; 4(4):71-81.*

The authors build a logistic regression model for 1,648 NCAA Tournament games that include features for distance travel, time zones crossed, direction of travel, altitude and temperature. They conclude "We found that traveling east reduces the odds of winning more than does traveling west, and this finding holds when controlling for strength of team, home region advantage and other covariates. Traveling longer distances (>150 miles) also has a dramatic negative effect on game outcomes..."

COMMENT: This paper shows that travel distance and direction has a statistically significant impact upon game results in the NCAA Tournament, but I want to add a few caveats to this conclusion. First, it isn't clear that the authors understand and control for the fact that there are many more basketball programs (and arguably stronger basketball programs) on the East Coast than elsewhere in the nation. For this reason, it's likely that teams moving west to play in the Tournament are stronger than teams moving east. Since the authors don't control for the strength of teams, it's impossible to say whether the claimed impact of direction of travel means anything. Second, the magnitude of these effects may not be huge. I don't understand how the authors calculate their "Odds Ratio" but factors like strength of team are several orders of magnitude more significant in determining outcome. Third, the authors are measuring strength of team by seed, which has several problems. It's a very coarse measure, it doesn't distinguish between teams with the same seed, and it's often poorly correlated with the actual team strength (i.e., teams are commonly mis-seeded). In my experience, many factors with low significance vanish when team strength is more accurately estimated. I think distance and direction of travel probably do have an impact on Tournament games, but I suspect the true effect is smaller than this paper would indicate.Clay, Daniel, "Player Rotation, On-court Performance and Game Outcomes in NCAA Men's Basketball", International Journal of Performance Analysis in Sport · August 2014

The authors look at the relationship between the size of rotation (how many players play at least 10 minutes in a game) and statistics such as rebounding, shooting percentage, etc. The authors conclude that teams with deep rotation tend to rebound better, particularly on the offensive end. They also have more steals. By contrast, smaller rotation teams tend to shoot the ball better, both field goals and free throws, and they are more effective at taking care of the ball, resulting in fewer turnovers. In general, a larger rotation improves the chance of winning.

COMMENT: There's quite a bit of interesting material in this paper, and I recommend reading it and drawing your own conclusions. I have reservations about some of the conclusions in this paper because the authors have not controlled for number of possessions in the game for many of the statistics. Since I'd expect (for example) that both the number of offensive rebounds and the depth of rotation to increase with more possessions, I'm not sure I immediately accept that teams with deeper rotations rebound better. The authors do control for possessions in two of the statistics (offensive and defensive rating) and those conclusions are more convincing. However, as far as I can tell the authors did nothing to control for overtime games, and that may also be affecting the results.

From the specific viewpoint of predicting game outcomes, the authors don't make use of any kind of strength rating, so it isn't clear whether depth of rotation has any predictive value that wouldn't already be covered by a good strength metric.

## Monday, March 28, 2016

### Sorry About That!

I have to apologize to anyone who Stole My Entry over on Kaggle, because the Net Prophet predictor has made a hash of it this Tournament, and is mired low in the Leaderboard and well below the median entry. A number of the upsets have been very improbable according to the Net Prophet predictor and it has suffered accordingly.

It's worth noting that some others have been suffering too: Monte McNair has done better than Net Prophet but not by a whole lot. Ken Massey entered for the first time and is very low on the Leaderboard (apparently because he gambled rather heavily on 2-15 matchups). The most interesting story is ShiningMGF, who started poorly (perhaps because their first-round predictions are influenced by the Vegas lines?) but have been climbing steadily and are now in tenth place. Top Ten finishes three years running is almost certainly a good indication that they know something the rest of us don't!

Over at the Machine Madness contest, Net Prophet isn't doing any better, being one of the many entries that predicted Kansas as the eventual champion. It looks like "SDSU" has the win locked up already. "Predict the Madness" is likely to finish second unless North Carolina loses the next game. Beyond that it gets a little murky, but all the entries with UNC winning it all have an obvious advantage.

But regardless of who wins, it's been a great turnout for the contest (40 entries!) and I want to give my sincere thanks to everyone who entered. It's really great to see so much interest and participation!

It's worth noting that some others have been suffering too: Monte McNair has done better than Net Prophet but not by a whole lot. Ken Massey entered for the first time and is very low on the Leaderboard (apparently because he gambled rather heavily on 2-15 matchups). The most interesting story is ShiningMGF, who started poorly (perhaps because their first-round predictions are influenced by the Vegas lines?) but have been climbing steadily and are now in tenth place. Top Ten finishes three years running is almost certainly a good indication that they know something the rest of us don't!

Over at the Machine Madness contest, Net Prophet isn't doing any better, being one of the many entries that predicted Kansas as the eventual champion. It looks like "SDSU" has the win locked up already. "Predict the Madness" is likely to finish second unless North Carolina loses the next game. Beyond that it gets a little murky, but all the entries with UNC winning it all have an obvious advantage.

But regardless of who wins, it's been a great turnout for the contest (40 entries!) and I want to give my sincere thanks to everyone who entered. It's really great to see so much interest and participation!

## Tuesday, March 22, 2016

### What Would a Perfect (Knowledge) Predictor Score in the Kaggle Competition?

It isn't possible to have a perfect predictor for NCAA Tournament games, because the outcome is probabilistic. We can't know for sure who is going to win a game. But we could conceivably have a predictor with

The Kaggle contest uses a log-loss scoring system. In this system, a correct prediction is worth the log of the confidence of the prediction, and an incorrect prediction is worth one minus the log of the confidence of the prediction. (And for the Kaggle contest the sign is then swapped so that smaller numbers are better.

Let's return to our example of Duke versus Yale. Our perfect knowledge predictor predicts Duke over Yale with 0.75 confidence. What would this predictor score in the long run? (I.e., if Duke and Yale played thousands of times.) Since the prediction is also the true probability that Duke will win, that number is given by the equation:

that is, 75% of the time Duke will win and in those cases the predictor will score ln(0.75), and 25% of the time Yale will win and the predictor will score ln(0.25). This happens to come out to about -0.56 (or 0.56 in Kaggle terms).

Between this Wikipedia page and this ESPN page we can determine the win percentages for every possible first-round matchup. There have been a reasonable number of these matchups (128 for each type of first-round matchup) so we can have at least a modicum of confidence that the historical win percentage is indicative of the true advantage:

Using the win percentage as the true advantage, we can then calculate what our perfect knowledge predictor would score in each type of match-up:

Since there are equal numbers of each of these games, the average performance of the predictor is just the average of these scores: -0.48.

This analysis can be extended in a straightforward way to the later rounds of the tournament, but since there are fewer examples in each category it's hard to have much faith in some of those numbers. But I would expect the later round games to make the perfect knowledge predictor's score worse, because more of those games are going to be close match-ups like the 8 vs. 9 case.

So 0.48 probably represents an optimistic lower bound for performance in the Kaggle competition.

UPDATE #1:

Here's an rough attempt to estimate the performance of the perfect predictor in the other rounds of the Tournament.

According to the Wikipedia page, there have been 52 upsets in the remaining rounds of the Tournament (a rate of about 2%). If we treat all these games as having an average seed difference of 4 (which is a conservative estimate), then our log-loss score on these games would be about -0.66. (Intuitively, this is as we would expect -- with most of the low seeds eliminated, games in the later rounds are going to be between teams that are more nearly equal in strength, so our log-loss score will be correspondingly worse.) Since there are as many first round games as all the other rounds, the overall performance is just the average of -0.48 and -0.66: 0.57.

UPDATE #2:

Over in the Kaggle thread on this topic, Good Spellr pointed out that if you treat the first round games as independent events with a normal distribution, you can estimate the variance as well:

`variance = (1/n^2) sum_(i=1)^n p_i*(1 - p_i)*(Log[p_i/(1 - p_i)])^2`

which works out to a standard deviation of about 0.07. That means that after the first
round of the tournament, the perfect prediction would fall in the range
[0.34, 0.62] about 95% of the time.

.

*perfect knowledge*. This predictor would know the true probability for every game. That is, if Duke is 75% likely to beat Yale, the perfect knowledge predictor would provide that number. (Because predicting the true probability results in the best score in the long run.) What would such a predictor score in the Kaggle Contest?The Kaggle contest uses a log-loss scoring system. In this system, a correct prediction is worth the log of the confidence of the prediction, and an incorrect prediction is worth one minus the log of the confidence of the prediction. (And for the Kaggle contest the sign is then swapped so that smaller numbers are better.

Let's return to our example of Duke versus Yale. Our perfect knowledge predictor predicts Duke over Yale with 0.75 confidence. What would this predictor score in the long run? (I.e., if Duke and Yale played thousands of times.) Since the prediction is also the true probability that Duke will win, that number is given by the equation:

`0.75 * ln(0.75) + (1-0.75) * ln(1-0.75)`

that is, 75% of the time Duke will win and in those cases the predictor will score ln(0.75), and 25% of the time Yale will win and the predictor will score ln(0.25). This happens to come out to about -0.56 (or 0.56 in Kaggle terms).

So we see how to calculate the expected score of our perfect knowledge predictor given the true advantage. If the favorite in all the Tournament games was 75% likely to win, then our perfect predictor would be expected to score 0.56. But we don't know the true advantage in Tournament games, and they're all different advantages. Is there some way we can estimate this?

One approach is to use the historical results. We know how many games were upsets in past Tournaments, so we can use this to estimate the true advantage. For example, we can look at all the historical 7 vs. 12 matchups and use the results to estimate the true advantage in those games. (One problem with this approach is that in every Tournament, some teams are "mis-seeded". If we judge upsets by seed numbers, this adds some error.)

Seed | Win Pct |
---|---|

1 vs. 16 | 100% |

2 vs. 15 | 94% |

3 vs. 14 | 84% |

4 vs. 13 | 80% |

5 vs. 12 | 64% |

6 vs. 11 | 64% |

7 vs. 10 | 61% |

8 vs. 9 | 51% |

Using the win percentage as the true advantage, we can then calculate what our perfect knowledge predictor would score in each type of match-up:

Seed | Win Pct | Score |
---|---|---|

1 vs. 16 | 100% | 0.00 |

2 vs. 15 | 94% | -0.22 |

3 vs. 14 | 84% | -0.45 |

4 vs. 13 | 80% | -0.50 |

5 vs. 12 | 64% | -0.65 |

6 vs. 11 | 64% | -0.65 |

7 vs. 10 | 61% | -0.67 |

8 vs. 9 | 51% | -0.69 |

Since there are equal numbers of each of these games, the average performance of the predictor is just the average of these scores: -0.48.

This analysis can be extended in a straightforward way to the later rounds of the tournament, but since there are fewer examples in each category it's hard to have much faith in some of those numbers. But I would expect the later round games to make the perfect knowledge predictor's score worse, because more of those games are going to be close match-ups like the 8 vs. 9 case.

So 0.48 probably represents an optimistic lower bound for performance in the Kaggle competition.

UPDATE #1:

Here's an rough attempt to estimate the performance of the perfect predictor in the other rounds of the Tournament.

According to the Wikipedia page, there have been 52 upsets in the remaining rounds of the Tournament (a rate of about 2%). If we treat all these games as having an average seed difference of 4 (which is a conservative estimate), then our log-loss score on these games would be about -0.66. (Intuitively, this is as we would expect -- with most of the low seeds eliminated, games in the later rounds are going to be between teams that are more nearly equal in strength, so our log-loss score will be correspondingly worse.) Since there are as many first round games as all the other rounds, the overall performance is just the average of -0.48 and -0.66: 0.57.

UPDATE #2:

Over in the Kaggle thread on this topic, Good Spellr pointed out that if you treat the first round games as independent events with a normal distribution, you can estimate the variance as well:

.

## Sunday, March 20, 2016

### A Quick Update

I'm still in Brooklyn watching games (well, we're done watching now -- had a couple of fun games) and have been too busy to do more than minimum checking of email, but I found time to check on the Machine Madness contest. I see that we have an amazing 40 contestants -- presumably most found us through the Kaggle Contest, but it's great to see the participation. What's not so great is that the Net Prophet entry is doing poorly both here and at the Kaggle Contest, but that's a post for another day :-)

## Tuesday, March 15, 2016

### Year End Rankings

I'm not really into ranking teams that much (because match-ups depend on many more factors), but I came up with a new (and I think better) rating system today and here's how it ranks the Top Twenty:

I'm not entirely sure what I think of this. The top of the rankings isn't too surprising, although I think most folks wouldn't have UNC ahead of Kansas and MSU. Oklahoma is much lower than the #2 seed they received. Wichita State is also a surprise at 20 -- although they seem to be handling Vanderbilt tonight so maybe there's something to that.

And I guess you could conclude that it's a bad year for Louisville and SMU to be on probation -- they were both very solid this year.

Rank | Team | Rating |
---|---|---|

1 | North Carolina | 131.6 |

2 | Kansas | 129.6 |

3 | Michigan State | 126.9 |

4 | West Virginia | 125.3 |

5 | Virginia | 117.9 |

6 | Villanova | 114.9 |

7 | Oregon | 112.1 |

8 | Xavier | 110.4 |

9 | Purdue | 109.3 |

10 | Louisville | 108.9 |

11 | Arizona | 106.1 |

12 | Duke | 105.4 |

13 | Kentucky | 105.2 |

14 | SMU | 104.0 |

15 | Indiana | 103.9 |

16 | Oklahoma | 103.7 |

17 | Miami Florida | 99.9 |

18 | Maryland | 97.9 |

19 | Baylor | 97.7 |

20 | Wichita State | 97.4 |

I'm not entirely sure what I think of this. The top of the rankings isn't too surprising, although I think most folks wouldn't have UNC ahead of Kansas and MSU. Oklahoma is much lower than the #2 seed they received. Wichita State is also a surprise at 20 -- although they seem to be handling Vanderbilt tonight so maybe there's something to that.

And I guess you could conclude that it's a bad year for Louisville and SMU to be on probation -- they were both very solid this year.

## Monday, March 14, 2016

### Does Coaching Experience Matter?

One of the things I investigated in the run-up to the Tournament this year was whether coaching experience matters. My approach was pretty simplistic -- I offered my prediction model information on how a team/coach had performed the previous year in the Tournament to see if that information had any predictive value. It didn't -- at least for my model.

Over at Harvard Sports Analysis Collective (worth reading, by the way), Kurt Bullard takes a better look at the same question. He looks at how coaches perform relative to their seeding over their coaching lifetime. If experience matters, you'd expect coaches with more experience to do better. But that's not the case -- there's no correlation between how well a coach does and how much experience he has. (Alternatively, it could be that his experience is factored into the seed his team gets, although I'd argue that's probably not the case.)

At any rate, you might want to be leery of analysts who say that "Michigan State is going to do well in the Tournament because Coach Izzo has more experience than anyone in the Tournament." Michigan State probably

Over at Harvard Sports Analysis Collective (worth reading, by the way), Kurt Bullard takes a better look at the same question. He looks at how coaches perform relative to their seeding over their coaching lifetime. If experience matters, you'd expect coaches with more experience to do better. But that's not the case -- there's no correlation between how well a coach does and how much experience he has. (Alternatively, it could be that his experience is factored into the seed his team gets, although I'd argue that's probably not the case.)

At any rate, you might want to be leery of analysts who say that "Michigan State is going to do well in the Tournament because Coach Izzo has more experience than anyone in the Tournament." Michigan State probably

**is**going to do well -- but that's because the Committee mis-seeded them, not because of Coach Izzo's experience.
Subscribe to:
Posts (Atom)