Showing posts with label ncaa tournament. Show all posts
Showing posts with label ncaa tournament. Show all posts

Thursday, March 22, 2012

Yet Another Look at Upsets

What does it mean to call a tournament game an upset?

At the simplest level, it means a lower-seeded team beating a higher-seeded team.  This can happen for two reasons.  First, the committee may have "blown" the seedings -- as they arguably did with Texas / Cincinnati and Purdue / St. Mary's this year, two games that most of the machine predictors thought would be upsets.  Second, an upset can happen when the weaker team plays well and/or the better team plays poorly.  College basketball teams don't play at their mean performance every game.  Some games are better and some are worse, and this can lead to an unexpected result.  This understanding suggests that upsets may be more likely when two inconsistent ("volatile") teams meet.

Imagine two hypothetical teams that played the same schedule.  Team A averaged 84 points per game and scored between 81 and 88 points every game.  Team B also averaged 84 points per game, but scored between 28 and 96 points.  Now both these teams play Team C, that averaged 70 points per game against the same competition.  Which is team is Team C more likely to beat?  It seems reasonable to guess Team B. 


So how can we identify these "volatile" teams?  The obvious method is to measure something like the standard deviation of a team's performance over the course of the season.  But we have to be careful in how we do this.  For example, measuring the standard deviation of points scored might be very misleading because of pace issues.

Fortunately for me, I already have a good measure of team performance that includes standard deviation: TrueSkill.  This probably isn't a perfect proxy for measuring a team's consistency, but it's certainly good enough for a quick investigation into the merits of predicting upsets by looking at consistency.  (It's easier to think of this measure as volatility rather than consistency, so that the higher values mean more volatility.)

I took all of this year's first round games and ranked them according to the combined volatility of the two teams involved and then identified the most volatile game at each seed differential to see how well this predicted upsets:

Seeding Most Volatile Game by Seed Differential Upset?
8-9 Kansas St. - Southern Miss N
7-10 St. Mary's - Purdue Y
6-11 Murray St. - CSU N
5-12 Vanderbilt - Harvard N
4-13 Wisconsin - Montana N
3-14 Marquette - Iona N
2-15 Missouri - Norfolk St. Y
1-16 Syracuse - NC Asheville N

This seems mildly promising.  It identifies two upsets correctly, including the Missouri-Norfolk St. upset.  This is particularly interesting because that upset was not on anyone's radar.   Most of the other games are at least "reasonable" choices for upsets in their seedings.  (It also identifies CSU over Murray St, which may explain this pick by AJ's Madness in the Machine Madness contest.)

One problem with this approach is that seeding is a rather broad measure of team strength.  For example, Duke was by far the weakest of the #2 seeds.  It might be productive to use a more accurate measure of the strength differences between the teams.  We can use the mean TrueSkill measure for each team to do that, and rank teams according to the sum of the standard deviations divided by the difference of the means.  That results in this table:

Seeding Most Volatile Game by Strength Differential Upset?
8-9 Creighton - Alabama N*
7-10 St. Mary's - Purdue Y
6-11 SDSU - NC State Y
5-12 Temple -USF Y
4-13 Michigan - Ohio Y
3-14 Georgetown-Belmont N
2-15 Duke - Lehigh Y
1-16 North Carolina - Lamar N
* One point win for Creighton

This works remarkably well for this year's first round -- especially considering that there were no upsets in the 3-14 or 1-16 matchups.  Of course, identifying the most likely upset at a particular seeding isn't quite the same as identifying the most likely upsets across the whole bracket, so let's look at the top 8 upsets predicted by this metric across the entire first round:

Seeding Most Volatile Games Overall  Upset?
5-12 Temple - USF Y
6-11 SDSU - NC StateY
7-10 Notre Dame - Xavier Y
7-10 St. Mary's - Purdue Y
8-9 Creighton - Alabama N*
7-10 Florida - Virginia N
6-11 Cincinnati - Texas N
8-9 Memphis-St. Louis
Y
* One point win for Creighton

Again, this is pretty good performance -- 75% correct in the first four picks and 50% correct in the first eight.

To a certain extent, a good predictor is going to capture some of this anyway (the Pain Machine identified the three correct upsets in the first four picks), but looking at the volatility of team performance may be good additional information in predicting tournament upsets.

Wednesday, March 21, 2012

Upset Review

For the past three years that the Pain Machine has participated in the Machine Madness contest, I've maintained (without any real justification) that the proper strategy is to pick the correct upsets -- as opposed to simply picking the most likely outcome, which will be the higher seed in every case where the committee hasn't completely blown the seeding.  In light of that, I wanted to review the PM's upset-picking strategy and see how it has worked out this year.

The PM predicts the Margin of Victory for each tournament game.  With two exceptions this year, the predicted winner was the higher-seeded team.  Historically, we know that the upset rate in the first round has been around 22%, and the upset rate for the whole tournament around 15%.  (An upset is where a team seeded at least 2 lower than its opponent wins the game.  A #9 over a #8 is not considered an upset.)  In light of this, I force the PM's tournament picks to include 6 upsets in the first round and 5 more in the rest of the tournament.

The picking strategy is fairly straightforward.  First of all, any games where the PM thinks an upset will happen are marked as upsets.  After that, the PM marks the remaining of 6 games in the first round which have the lowest predicted MOVs as upsets and (after recalculating the rest of the bracket based upon those upsets) the remainder of 5 games in the rest of the bracket by the same criterion.

This year, that resulted in these upset picks (predicted MOV shown in parentheses, correct picks bolded) for the first round:

(11) Texas over (6) Cincinnati (-0.6)
(12) Cal/USF over (5) Temple (1.4)
(11) NC State over (6) SDSU (1.9)
(10) Purdue over (7) St. Mary's (3.3)
(10) WVU over (7) Gonzaga (3.3)
(9) UConn over (5) Iowa St. (3.6)

The PM picked 3 of these 6 upsets correctly: USF, NC State and Purdue.  Texas shot just 16% in the first half and still managed to tie the game in the second half but couldn't finish the rally.  The other two games were not very close.  Still, getting 50% correct on upsets is probably pretty good performance.

The PM has the following upsets picked in later rounds:

(2) OSU over (1) Syracuse (-0.8)
(2) Kansas over (1) Kentucky (0.6)
(11) Texas over (3) FSU (2)
(5) New Mexico over (4) Louisville (2.7)
(6) Baylor over (2) Duke (3)

The FSU and Duke upsets cannot happen.  The New Mexico upset did not happen.  The other two games have not yet occurred.

We can also look at the (say) the most likely upsets at each seed position.  These were:

(16) UNC-Asheville vs. (1) Syracuse (16.1)
(15) Lehigh vs. (2) Duke (12.8)
(14) Belmont vs. (3) Georgetown (6.3)
(13) Ohio vs. (4) Michigan (7.9)
(12) Cal/USF over (5) Temple (1.4)
(11) Texas over (6) Cincinnati (-0.6)
(10) Purdue over (7) St. Mary's (3.3)
(9) UConn over (8) Iowa State (3.6)

Again, the PM got 50% correct.

Of course, the PM also missed a number of upsets:

(12) VCU over (5) Wichita St. (9.6)
(10) Xavier over (7) Notre Dame (7.7)
(15) Norfolk St. over (2) Missouri (23.2)
(11) NC State over (3) Georgetown (5.4)

The Norfolk State win really stands out here as the outlier -- it was at least twice as unlikely as the Duke-Lehigh upset.  I don't have the statistic handy, but 23 point upsets have to be greater 1 in a 1000 historically.  (The beating Norfolk St. took in the next round is indicative of how anomalous the first round upset was.)  VCU was a darling upset pick for many, in part due to their Cinderella status last year.  This year's VCU team was considerably weaker, and the win over Wichita State was another very unlikely result.  The Georgetown upset was the least surprising.  The 5 point differential is well within the ~10 point error margin of the PM's predictions.

Overall, I give the PM a very positive grade for it's upset picks.  It's clearly able to identify games where upsets are likely.  I may have to work on how it selects upsets, though.  There isn't a strong correlation between the magnitude of MOV and the likelihood of upset when MOV is under about 6 points, so it may not make sense to pick the games with the lowest MOVs.  It may make more sense to pick upsets based upon other factors.

Tuesday, March 20, 2012

NCAA Tournament Home Court Advantage

It's a common assumption that neutral court games should be treated differently from games played at one team's home court, but is that really true?  The SI article that looked at home court advantage concluded that it was primarily due to the referees treating the home team differently.  That jibes with something I found -- that large home dogs don't get a HCA.

Presumably the refs don't give the benefit of the doubt when they know the home team is overmatched.

I did some other experiments (prior to starting the blog, so they aren't documented here) where I trained a predictor on regular season games using just a strength measure for each team, so that the prediction equation looked like this:

    MOV =  (C1 * Strength of Home Team) + (C2 * Strength of Away Team) + C3

C2 was negative, and C3 (along with any C1/C2 ratio) was the "home court advantage".

I then tested the accuracy of this predictor on NCAA tournament games, first treating the higher seed as the home team, then the lower seed as the home team, and then washing out HCA altogether by dropping C3 and forcing C1 & C2 to be equal. 

What I found was that the best prediction was made treating the higher seed as the home team.  This makes some intuitive sense -- the refs are giving the benefit of the doubt to the team that they "know" is the better team.  So I'm a little dubious that there's really no "HCA" in tournament games, although I don't know that anyone else has looked at it.

Friday, March 16, 2012

Pain Machine Tournament Picks

You should be able to see (I believe) the Pain Machine's tournament picks here

For the tournament, I don't like to make all "chalk" picks so I am set up to force a certain number of upset picks.  (This is complicated somewhat by the play-in games.)  If the PM predicts an outcome where a lower seed beats a higher seed, that counts as one of the upset picks.  Otherwise the PM converts the weakest wins to losses until it has the requisite number of upsets.  This year, the PM's predicted first-round upsets (in order of likelihood) were:

Texas (11) over Cincy (6)                   
Cal (12) over Temple (5)                      
NC State (11) over San Diego St. (6)  
Purdue (10) over Saint Mary's (7)        
WVU (10) over Gonzaga (7)                 

There are a couple of upsets in the later rounds, notably Kansas over UK (although that game is a near coin-flip according to the PM -- UK by 0.6 points).

Texas came very close to beating Cincy.  The PM actually also had Texas upsetting FSU, so that loss hurts.

Cal didn't even make it to the Temple game, having one of the worst tournament performances ever losing to USF.  Nonetheless I kept this prediction, now USF over Temple.  NC State beat SDSU.  The Purdue / St. Mary game is tonight.  I was at the WVU / Gonzaga game -- Gonzaga simply outplayed WVU and shot very, very well.

The PM is once again competing in the Machine Madness contest, which you can follow here.  As with the past few years the "chalk" picks are leading the contest, but that will change if any of the competitors get a few upset picks right.   A quick look suggest that so far only a few entries have gotten an upset pick correct -- the PM has the NC State result and AJ's Madness got the VCU pick correct.  (The bottom couple of entries seem fairly random.)  The overall winner will probably be determined by who wins the tournament.  If Kentucky wins it will be Danny or the Matrix Factorizer (depending upon the outcome of Duke/Baylor).  Otherwise it will go to the predictor who got the winner correct.

Thursday, March 15, 2012

Last Minute Corrections

The Pain Machine was predicted a rather easy victory for Cal over USF.  Apparently it got that one wrong :-).
This has caused some last minute shuffles in its predictions.

I will post its full predictions after the Tournament starts.  I'm actually in Pittsburgh to watch the games here.  The OSU game doesn't start until 9:50 PM !?  What is the NCAA thinking?

Tuesday, March 13, 2012

Even More Upsetting!

My model does not have a #1 seed winning the tournament.  It has the title game as a near coin-flip game that will be the final upset of the tournament.

This does not include any corrections for the Fab Mello situation, but that would not change the outcome.

Monday, March 12, 2012

It's Upsetting!

According to my predictor, the most likely upsets at the 4, 3, 2, and 1 levels:

Ohio (13) over Michigan (4)            [Louisville close behind]
Belmont (14) over Georgetown (3) [FSU close behind]
Lehigh (15) over Duke (2)                 [By far the weakest #2 seed]
Lamar (16) over UNC (1)                   [Syracuse close behind]

Lock of the tournament: MSU over LIU Brooklyn.