Wednesday, March 27, 2013

The Prediction Machine’s Bracket

The Prediction Machine is primarily focused on picking the margin of victory for regular season games, but I also use it to create a bracket for the Machine Madness Contest.  The contest has been going on for a few years, and my approach to picking a bracket has evolved.

Initially, the Prediction Machine picked the most likely winner of each game – whichever team it deemed stronger.  But there’s a serious drawback to this approach.  The Committee is already pretty good at determining the relative strength of the teams, so by and large the Prediction Machine’s picks agreed with the seedings.  It only differed where the Committee had “mis-seeded” teams.  That seems to happen every year, but there’s usually only one or two mis-seeds.  So you end up with a bracket that may be the most likely outcome, but which is also going to be very similar to many other brackets.  (In fact, we see that very thing in this year’s Machine Madness competition: “Danny’s Dangerous Picks” and “Predict the Madness” are identical after the second round.)  This makes it very hard to finish high in a pool with a lot of entrants.

In the next iteration, I forced the Prediction Machine to pick about 15% of the games as upsets.  I chose that number because historically, that’s about how many upsets there are each Tournament.  The Prediction Machine did this by ranking the upsets and selecting the top 6 upsets in the first round and 5 more in the rest of the tournament.  The idea was to get away from the consensus picks of the other competitors while picking the most likely upsets.  But this is too risky a strategy.  Depending upon the size of the pool, you probably don’t need to get 11 upsets correct to do very well.  For example, in last year’s Machine Madness pool, it would have been sufficient to get 8 points from upsets – which could be just one correct upset pick in the round of 8.

This year, the Prediction Machine used an algorithm which took a target number of upset points and tried to select the most likely set of upsets to meet that total.  Initially I planned to use a target number of 8 points – based on last year’s results – but in the end decided to set the target higher, with the goal of ending up in the top 5% of the ESPN contest if the upsets occurred as predicted.  I placed that goal at (a somewhat arbitrary) 50 points.  I then used the Prediction Machine to predict all the chalk matchups in the tournament.  This identified a number of games where the Prediction Machine thought the lower-seeded team would win:

Home Away
Georgetown Florida
UCLA Minnesota
Kansas St. Wisconsin
Colorado St. Missouri
Memphis St. Mary's
New Mexico Arizona

This adds up to 11 points of mis-seeds.  That’s a surprising number and may reflect an unusual basketball season.  When I plugged these upsets in and ran the tournament again, I discovered that the Prediction Machine also favored #3 Florida over #1 Kansas (an 8 point game), so I added that in for 19 total points of mis-seeds.

The PM then identified the most likely upsets in the remaining games.  These were the top results:

Home Away
Gonzaga Ohio St. 17.5
Notre Dame Iowa St. 17.4
Miami (FL) Marquette 17.3
Louisville Indiana 14.1

The PM then added upsets in order of likelihood until it reached 50 (or in this case, 64).  (The next upset on the list was Oklahoma over San Diego State.)

There are a couple of refinements to this approach that I haven’t had time to incorporate.  A simple refinement would be to drop 14 points of upsets to get back to 50 points.  A more complex refinement would be to try different combinations of upsets to get the most likely combination that reaches the target points.  Either refinement in this year would have ended up keeping just the Louisville-Indiana upset in the final game.

It’s just as well that I didn’t have time to implement either refinement.  This year’s Machine Madness field turned out much larger than expected (27 competitors!) and even if Indiana wins everything, I won’t win the competition unless Marquette beats Miami – one of the upsets that would be dropped to get back to 50.

Looking at the Prediction Machine’s performance, in the first rounds it went 2-2 for mis-seeds/upsets, and in the second round 1-1.  50% correct on picking upsets is probably a pretty good performance.  In the ESPN competition, the Prediction Machine’s bracket is at 94.4% out of about 8 million entries, with 7 out of the Round of Eight still alive.


  1. So the PM predicted that two out of three among Wichita State, Florida Gulf Coast, and LaSalle would make the sweet 16? I think you have a typo. And if is is really 13 out of 16, wouldn't that be the same as if one just followed the committee' seedings?

  2. You're right, that was a typo. I meant 7 out of the Final 8.


Note: Only a member of this blog may post a comment.