Wednesday, December 9, 2015

Working Overtime

Overtime is one of the interesting quirks of basketball.  In some sports -- particularly low-scoring sports like soccer and hockey -- a game may end in a tie.  But in college basketball teams play additional periods -- as many as needed -- until a winner is determined.

Overtime games skew team statistics.  ESPN and other sites typically have pages of statistics such as "Points Per Game".  But if one game is 40 minutes long and another is 226 minutes long, it's not really an apples-to-apples comparison.  This is one reason analysts are fond of "per possession" statistics -- not only does it correct for pace of play, but it also corrects for overtime games.

Clearly the statistics you feed into a predictor need to be corrected somehow for overtime games.  But there's another interesting overtime issue to consider:  What's the final score of an overtime game?

One choice is to use the score at the end of the overtime(s).  The other is to treat the game as a tie.  There are intuitive arguments in favor of both choices.  The fact that Syracuse beat Connecticut suggests that Syracuse is a better team, regardless of how many minutes that took, so we should treat the game as a win for Syracuse.  On the other hand, the teams were deadlocked for six overtimes, which suggests that they're about as equal as it is possible to be, regardless of whether one team or the other managed to win the game in the wee hours of the morning.

Or maybe the game should be treated as a tie for some statistics and not for others.

As longtime readers of this blog are aware, I'm a believer in doing whatever works best.  So in this case, I made two runs of my predictor, once treating overtime games as ties and once using the actual  final scores.   In my case, the predictor performed better treating overtime games as ties.

Another possibility is to treat the final score of an overtime game as 1 or -1 (or 0.1 and -0.1 if your predictor can handle that), depending upon which team wins the overtime period(s).  This retains the won/loss information, but otherwise treats the game as (nearly) a tie.

For those of you who also have predictors, I encourage you to try the same experiment and report back which choice (if either) works better for you.


  1. Of course I didn't scrape for whether or not OT happened, and haven't updated it for the new ESPN yet!

    What if you treated all close games (1 score, maybe) as ties, and not just OT ones? OT sometimes happens just because of down to the wire luck and some fouling. It doesn't seem to reflect on a team's skill (unless a team noticeably) choked and lost a good lead) as being better or worse than another.

  2. If you search the blog for "close games" you'll see where I've done that experiment with a number of different rating systems. I'm not sure if I've tried it with the current incarnation of the predictor -- it's certainly worth an experiment!

  3. In the previous posts, it looks like you removed games with a low MOV from the data, but here you treat them as a tie. Keeping them around would, I'd assume, help prediction if only by retaining links between teams in the matchup graph. I say that because my impression of ratings systems is that they do better when the graph of teams is not just connected, but dense. NCAA football has this problem most of the time, which is why I'm glad there are playoffs now!

  4. Hrm.. interesting idea Scott! I did scrape minutes played but never bothered to integrate it.. guess this would be a good measurement for overtime or not. I'll have to play around with it! I like the -0.1 / 0.1 idea!