Monday, January 27, 2014

Top Twenty & Predictions (1/27)


1 Oklahoma St. 33.99
--
2 Louisville 33.96
(+1)
3 Arizona 33.67
(-1)
4 Duke 32.74
(+4)
5 Iowa 32.73
(-1)
6 Kentucky 32.58
(+2)
7 Creighton 32.58
(+4)
8 Michigan 32.03
(+5)
9 Kansas 32.01
--
10 Iowa St. 31.87
(+2)
11 UCLA 31.84
(+5)
12 Ohio St. 31.75
(-6)
13 Villanova 31.65
(-8)
14 Pittsburgh 31.55
(+4)
15 Syracuse 31.55
(-1)
16 Michigan St. 31.53
(-4)
17 Florida 31.33
(+2)
18 Arkansas 31.12
(-1)
19 Wisconsin 31.02
(-3)
20 Arizona St. 31
NEW

Oklahoma State clings to the #1 spot but Louisville continues to slowly gain ground.  The big movers this week include UCLA (+5), Michigan (+5), and tOSU (-6), Villanova (-8).  UCLA's probably the mystery there.  Certainly they benefit from solid wins over Stanford and Cal, but that seems like a big jump just based on those games.
PREDICTIONS
#11 Oklahoma State @ #25 Oklahoma:  Oklahoma State by 2
#24 Baylor @ #11 Oklahoma State: Oklahoma State by 17

The first game was an upset alert when I accidentally predicted it last week, but Oklahoma has regressed slightly since then.  Still a good chance for an upset, though.  Baylor's going to be out of the AP Top 25 after this week.
#18 Duke @ #20 Pittsburgh: Pittsburgh +4
#18 Duke @ #2 Syracuse: Syracuse +3

Duke has the ACC road trip from hell this week.  If they can win one or both of these games it will solidify their ranking.  Interesting that the PM thinks they have a better chance @Syracuse.
#3 MSU @ #10 Iowa: Iowa +6

Iowa is one of those teams the PM respects more than the AP does, so this will be an interesting test.
UPSET ALERT
#14 Kentucky @ LSU:  Toss-up
This could be a "trap" game for Kentucky's freshmen.
#16 Iowa St. @ #8 Kansas: Kansas +6
ISU is not quite as good as they looked earlier in the season.
#15 Cincy @ #11 Louisville: Louisville +15

Cincy bubbles on and off the bottom of the PM's Top Twenty while Louisville is solidly at the top, so it should be no surprise that the PM thinks this will be an easy victory for the Cardinals.
#17 tOSU @ #9 Wisconsin:  Wisconsin +6
These two teams have been racing each other to be the first to drop off the Top Twenty, but one of them will have to win.  Probably the home team.
BLOWOUT OF THE WEEK:   UCF @ #11 Louisville:  Louisville by 25
TOSS-UP OF THE WEEK:   Houston Baptist @ Lamar

Sunday, January 26, 2014

(1/20) Predictions Recap

#24 Baylor @ #6 Kansas:  Kansas +12

Kansas +10.  The PM apologizes for its inaccuracy.
 
#22 Kansas State @ Texas:  Texas +2

Texas +3.  The PM apologizes again.  Texas beat Baylor later in the week for 3 straight wins over ranked opponents.  (Which Michigan also managed this week.)
 
#9 Wisconsin @ Minnesota: Minnesota +1

Pitino's Gophers have played 4 ranked opponents in a row.  This would be a good win if they can manage it.

Oh, they managed it.  They won by 13.  The Badgers seem to be in a total tailspin.

Colorado @ #1 Arizona:  Arizona +12

Arizona +12

#10 Iowa @ #21 Michigan: Michigan +4
#21 Michigan @ #3 MSU: MSU +3.5

If Michigan manages to win both games this week, they're going to solidify as one of the best teams in the country.

Michigan +8, +5.  So two nice wins this week for Michigan.
 
#22 Kansas State @ #16 Iowa State:  Iowa State +9

Iowa State +6
 
TOSS-UP OF THE WEEK:  Oregon St. @ Washington

Not much of a toss-up as it turned out; Oregon State won by 11.
 
BLOWOUT OF THE WEEK:  Maine @ Stony Brook (+23.5)

Maine +18.  A better blowout pick might have been Dartmouth @ Harvard.  Dartmouth lost by 30.

Monday, January 20, 2014

Top Twenty & Predictions (1/20)

TOP TWENTY 

1 Oklahoma St. 34.17 -
2 Arizona 33.64 -
3 Louisville 33.60 -
4 Iowa 32.79 (+1)
5 Villanova 32.38 (+4)
6 Ohio St. 32.19 -
7 Duke 32.17 (+4)
8 Kentucky 32.11 -
9 Kansas 31.87 (+7)
10 Creighton 31.86 (-3)
11 Iowa St. 31.84 (-7)
12 Michigan St. 31.66 (+2)
13 Michigan 31.61 (+5)
14 Syracuse 31.49 (-2)
15 UCLA 31.43 -
16 Wisconsin 31.23 (-6)
17 Arkansas 31.16 (-4)
18 Pittsburgh 31.11 (-1)
19 Florida 30.80 (+2)
20 Cincinnati 30.79 (-1)

A lot of movement in the Top Twenty this week.  Kansas vaults upwards 7 places after beating #8 ISU and #9 OKSt.  Likewise Michigan, with a big win over #3 Wisconsin and a string of solid victories since losing by only 2 to #1 Arizona.  Villanova has been quietly creeping up the Top Twenty with solid wins against lesser opponents.  The big losers of the week were ISU (losers of their last three games) and Wisconsin (losers of their last two).


PREDICTIONS

#24 Baylor @ #6 Kansas:  Kansas +12

Baylor is #41 in the PM's rankings. 
#22 Kansas State @ Texas:  Texas +2

Texas can follow up the nice win over ISU with a win over KSU and maybe climb into the AP rankings.  (Although they have a tough test @Baylor later in the week.)
#9 Wisconsin @ Minnesota: Minnesota +1

Pitino's Gophers have played 4 ranked opponents in a row.  This would be a good win if they can manage it.
#10 Iowa @ #21 Michigan: Michigan +4

The PM has these teams pretty closely matched; Michigan gets the home court advantage.
Colorado @ #1 Arizona:  Arizona +12

It's obligatory to include this game, but Colorado doesn't have much hope.
#21 Michigan @ #3 MSU: MSU +3.5

A surprisingly close prediction on this game.  If Michigan manages to win both games this week, they're going to solidify as one of the best teams in the country.
#22 Kansas State @ #16 Iowa State:  Iowa State +9

Iowa State gets a chance to right their sinking ship.
TOSS-UP OF THE WEEK:  Oregon St. @ Washington
BLOWOUT OF THE WEEK:  Maine @ Stony Brook (+23.5)
UPSET ALERT OF THE WEEK:  Oklahoma State @ Oklahoma

Thoughts on the Kaggle Contest

As mentioned in a previous post, Kaggle is sponsoring a March Madness contest.  After some false starts, I managed to figure out the rules, scoring, and submission format.  The first phase of the contest is scoring predictions for the past five tournaments.  I entered a submission based on the Prediction Machine's point spreads and placed in the Top Ten of the leaderboard.

Some random thoughts about the contest in no particular order.

(1) The "typical" March Madness contest awards full points for predicting a game correctly, and no points for predicting a game incorrectly.  Scoring is totally dependent upon the outcome of the game, so scoring for games between closely matched opponents is essentially random.  Consequently, winning these contests usually comes down to getting a few late round upset picks correct, a topic I've previously explored.

The Kaggle contest is using an interesting alternative scoring method, the log loss, also called the predictive binomial deviance.  Submissions give a likelihood from 0 to 1 for a particular outcome (e.g., Arizona will beat UNC Greensboro).  The more certain (closer to 1) the prediction, the higher the reward (penalty) for getting the game right (wrong).  For close games, you can predict an outcome around 0.50 and get a small reward if you are right but only a small penalty if you are wrong.  This scoring metric does a better job of rewarding contestants who accurately judge the relative strengths of the teams in each game rather than the outcome of that particular game (if that makes sense).

You can also think of this scoring method as a betting strategy.  When you place a high likelihood on a particular outcome, it's like betting a lot on the game.  When you place an even likelihood on a particular outcome, it's like betting only a small amount on the game.  The winner is the contestant who ends up with the most money at the end of the tournament.

(2)  The problem with predicting "likelihood" is that there's no way to measure the actual likelihood.  If we made the teams play a 100 games, we'd get a good approximation of the likelihood, but that's obviously not reasonable.  So there's really two parts to each submission:  (a) assessing the relative strength of the competing teams, and (b) translating that into a likelihood of victory for one of the teams. 

To see that these are two separate problems, imagine that every competitor had to base their entry on the RPI scores of the teams.  Every competitor would have the same relative strength assessment.  But they could translate that into a likelihood any way they wanted.  One competitor might use an exponential model with an exponent of 15, another an exponential model with an exponent of 22, another a logistic distribution, etc.  The winner in this case would be whomever happened to pick the best likelihood model for that year's tournament.

To my mind, it would be a better test of the predictors to have them predict the point spread of each game.  Point spread is directly measurable and is the best proxy we have for likelihood, so we'd eliminate that element of how well the competitors translated relative strength to likelihood.  But this is probably a minor point -- predicting likelihood with a log loss evaluation is overall a pretty good approach.

(3)  So what's the right strategy for this contest?  The default strategy is obviously to make your best possible predictions for the games and enter that.  But does it ever make sense to intentionally use something other than your best possible prediction?

In a traditionally-scored tournament pool, I believe it does make sense to pick against your best predictions.  The reason is that most good predictors are going to have similar outcomes for almost all the games.  In that situation, the best possible result for your best predictions might be to end up in a multi-way tie for first place.  But in any decent size pool, the most likely result is that you're going to lose to someone who got lucky and picked one or more of the inevitable upsets.  So if you want to win the pool, you need to pick upsets yourself, and hope to get lucky.

It isn't clear to me that the same reasoning applies with the log loss scoring method.  Since it rewards accurate assessment more than game outcome, it may be that the best strategy is to simply use your best possible predictions.

(4)  Phase One of this contest is essentially meaningless.  The outcomes of the last five tournaments are known, so it is trivial to craft a "perfect" submission.  No one has done that yet, but the top of the leaderboard is already filled with (what appear to be) unrealistic submissions.  These submissions are probably "cheating" or are heavily tuned to do well on the Phase One test data.

(5) So what's the best "realistic" score for this contest?  By this, I mean the score over a large number of tournament games.

On the point spread side, the best known predictor for college basketball game outcomes are the Vegas closing lines.  This isn't an absolute bound on performance, but it's a good starting point. As I pointed out above, converting point spreads to likelihoods isn't straightforward, but with one reasonable approach, the lines have a log loss score of around 0.52 for the past few seasons of regular season games.  So I'd be dubious of any approach that does significantly better than that.

(6) That said, it's important to remember that a single NCAA tournament is a very small set of data.  It's perfectly reasonable to expect an approach that would be terrible on average over a large number of tournaments to do very well on any particular tournament (or vice versa).  For example, my entry to Phase One had a score of about .54.  When I look at how that entry scored on each individual season, I see that in some seasons it scored around .51.   So the winner of Phase Two could easily be someone who just happened to get lucky with a good score this year.

It wouldn't be an entirely unreasonable approach to build a model to assess team strengths, an algorithm for translating that to likelihoods and then tune that to do particularly well on some past tournament (say, 2010).  That's probably not the best general approach, but it might get lucky and do very well this year.

Sunday, January 19, 2014

Predictions Recap (from 1/13)

Some miscues, but overall a good week.  The PM correctly predicted the close games or upsets of Wisconsin@Indiana, tOSU@Minnesota, UCLA@Utah, Baylor@TTU and OK State@Kansas.

#15 Kansas @ #8 ISU:  ISU by 7

Did I say ISU would win by 7?  I meant lose by 7.
#3 Wisconsin @ Indiana:  Wisconsin by 3

I'm surprised to see this close a prediction. 

Apparently the PM's continued faith in Indiana was justified -- they won by 3, stormed the court and annoyed Dan Dakich.  Wisconsin went on to lose a home game to Michigan (?) and Indiana turned around and lost badly at home to Northwestern.  So go figure.
#13 Kentucky @ Arkansas:  Arkansas by 5

The PM thinks more of Arkansas than most observers; this will be a good test for them.

Arkansas won by 2 in OT.  The teams traded 3 pointers in the last 10 seconds of regulation, and looked to be headed to a second OT when the Wildcats failed to block out and allowed a put-back slam at the buzzer to win.
 
#11 tOSU @ Minnesota:  tOSU by 1.5

The Gophers get another shot at a big win.  They played MSU to OT before losing -- can they pull off something against Ohio State?

The answer to that would be "Yes".  Gophers by 10 (!).  Richard Pitino CotY candidate? 
 
#25 UCLA @ #21 Colorado:  Colorado by 4

The PM can't factor in the Dinwiddie injury, so this might be closer than predicted. 

UCLA won handily (+13).  Whether Colorado can adjust for the loss of Dinwiddie in the longer term remains to be seen.
 
Or UCLA might end up losing twice this week -- Utah is better than most people know.  4 losses by a total of 9 points, including an OT loss to then #10 Oregon.

Utah won this game by 5 points.  Sigh.  (I'm a UCLA alumnus.)
 
#22 Pittsburgh @ #2 Syracuse:  Syracuse by 4

Another game where I'd expect a bigger margin.

Syracuse by 5.  The PM apologizes for its inaccuracy.
 
#25 Oklahoma @ #12 Baylor: Baylor by 7

I'm not sure why either of these teams are ranked, although the PM has Baylor just outside the top 25.  That will change after Baylor loses to Texas Tech earlier in the week.

Baylor did indeed lose to Texas Tech and then followed up by losing to Oklahoma.  Certainly won't be ranked #12 next week!
 
#16 UMASS!!!1! @ The Elon Gators:  UMass by 4

Gods' favorite Elon will be overmatched.

I couldn't pass up an Elon game, but I almost used local favorites GMU against UMASS, a game that UMASS won on a last-second shot.  And Elon came into this game off an OT victory over Davidson...  Unfortunately Elon had no magic, and lost by 10.
 
TOSS-UP OF THE WEEK

#9 Oklahoma St @ #15 Kansas:  <1 point

It's going to be tough for Oklahoma State to win at Kansas.

As close as a Toss-Up can be:  Kansas strips OK State at the buzzer to prevent the OT.  And Embiid continues his application for PotY.
 
BLOWOUT OF THE WEEK
TCU @ #9 Oklahoma St: Oklahoma St by 28

TCU somehow managed to lose by 32.

Thursday, January 16, 2014

Kaggle March Madness Contest

Kaggle is sponsoring a March Madness competition to predict the NCAA Tournament.  (Although right now there are no prizes offered.)  Some semi-useful data is provided at the contest, including that game outcome data for the past five seasons and tournaments.  The data has been anonymized so it's usefulness is limited outside of the contest.

There are two stages to the contest.  Stage One, which is ongoing right now, is to predict the results of the last five tournaments.  Stage Two will be predicting this year's tournament when it starts up in March.  (Obviously Stage One is meaningless -- it's trivial to un-anonymize the data and make perfect predictions.)

I don't see any clear indication of how the contests are being scored.  Predictions are confidence levels, and the FAQ says that all games in the tournament will be equally weighted, but it isn't clear how submissions will be scored.  I've submitted a topic in the forum asking that question.

Monday, January 13, 2014

About Monte Carlo Simulations

Over on Stats Intelligence (a blog you should read if you don't already), there is a complaint today about the silliness of Monte Carlo simulations.  I have something to say about that, but before I do so, let me give a quick overview of Monte Carlo methods for those who might not be familiar with the approach.

The basic idea behind the Monte Carlo method is that you have a complex simulation of some process.  The outcome of the simulation depends upon some number of factors that you don't know with certainty, although you might be able to guess a likely range for them.  Also, these factors interact in complicated and non-obvious ways, so that a slight tweaking of a factor might lead to an entirely different outcome.  Using this sort of a simulation for prediction is pretty hopeless, because it's too sensitive to your guesses about these factors.  If you change your guess a little bit, the outcome changes.

For example, you might have a computer program that simulates an entire football game down-by-down.  Each down you select an offensive play, a defensive set, you choose which receiver to pass the ball to, etc.  You have random factors to cover things like corner backs falling down, etc.  This might make a fun game (think John Madden Football) but running Seattle vs. San Francisco one time and using that to predict next weekend's outcome is obviously foolish.

The idea with the Monte Carlo method is to "wash out" this sensitivity by running the simulation many, many times with a sampling of random values for the factors within their likely ranges.  Then you can sum up over the outcomes to get a percentage estimate for each outcome.  For example, you run John Madden Football ten thousand times and Seattle wins 64% of the time.

Over on Stats Intelligence, Jeff complains today about the arbitrary number of iterations people claim for their Monte Carlo simulations to give them a veneer of accuracy and in-depth analysis.  And I agree with him completely on this issue -- it's ridiculous to see "50,000" runs when you know that's 100x more than necessary.

But my complaint is different.

Over at the Harvard Sports Analysis blog (another blog you should read if you don't already), Julian Ryan has a posting which uses a Monte Carlo approach to estimate Harvard's chances of winning the Ivy League championship in basketball.  (By the way, he did 50,000 simulations :-)  In his simulations, he estimated Harvard's chance to win each game based upon Ken Pomeroy's ratings.

This sounds like a sophisticated approach that will give new insights into the Ivy League competition. 

But here's the thing.  The outcome of each simulated game is based upon exactly one factor -- a percentage derived from the ratings of the two teams.  So when Harvard plays Yale, you plug the ratings into a formula and out comes the likelihood of a Harvard victory -- 74%, say.  Now let's imagine we "simulate" that game 50,000 times by rolling a 100-sided die and giving Harvard a victory if the number is 74 or below.  At the end of this excruciating exercise, guess what percentage of the simulated games Harvard has won?

Yes, 74%.

IMPORTANT

The Monte Carlo method doesn't provide any value if your simulation is based upon a few fixed, known factors.

If you just have a few fixed factors, you can calculate the likelihood of an outcome directly.  You don't need to use a Monte Carlo approach.  If you look at Julian Ryan's results, you'll see that Harvard is the most likely winner of the Ivy League, followed by Princeton and then Columbia.  Is that a big insight from the Monte Carlo approach?  No.  That's simply the order of the teams in the Pomeroy ratings.  If you look at the expected number of wins for Harvard, you'll see it looks like a normal distribution.  Well, it should, because that's what you get when you take the mean of a bunch of random outcomes.

Now to be fair to Mr. Ryan, he's not the only one to do this sort of thing.  In fact, Ken Pomeroy uses Monte Carlo simulations to predict conference results.  (With, not surprisingly, the same results for the Ivy League.)  The only defense I can offer is that a Monte Carlo simulation is a fairly straightforward way to estimate a number that can be hard to calculate.  If I tell you that Harvard has a 94% chance of winning when it hosts Yale, and a 74% chance when playing at Yale, then the chance that Harvard goes 2-0 is .94x.74, the chance that they go 1-1 is .94*.36 + .06*.74 and the chance that they go 0-2 is .06*.36.  It gets increasingly hard to calculate the likelihoods as the number of teams and games goes up.

(Although in a league that plays home-and-home between all teams, this is all unnecessary.  The chances of winning the league are directly proportional to the strength ratings!)

In summary:  For a simple simulation based upon a few fixed factors, the Monte Carlo method may be useful for estimating hard-to-calculate numbers but doesn't offer any additional insight beyond the known factors.

Top Twenty (1/13)

1 Oklahoma St. 34.19
2 Arizona 33.23
3 Louisville 33.08
4 Iowa St. 32.75
5 Iowa 32.66
6 Ohio St. 32.63
7 Creighton 32.19
8 Kentucky 32.13
9 Villanova 32.12
10 Wisconsin 31.66
11 Duke 31.65
12 Syracuse 31.65
13 Arkansas 31.5
14 Michigan St. 31.44
15 UCLA 31.43
16 Kansas 31.4
17 Pittsburgh 31.32
18 Michigan 31.31
19 Cincinnati 31.01
20 Arizona St. 30.91
Big upward moves this week were Wisconsin +7, Kansas +5, and Iowa, UCLA and Pittsburgh all jumping 4 spots.  Downward:  Gonzaga -8, Arkansas -7, Michigan -6.  Oklahoma State continues to cling to the top spot with Arizona and Louisville closing fast.

PREDICTIONS

#15 Kansas @ #8 ISU:  ISU by 7

Should be a straightfoward ISU win.
#3 Wisconsin @ Indiana:  Wisconsin by 3

I'm surprised to see this close a prediction.  The PM has Indiana at #46.
#13 Kentucky @ Arkansas:  Arkansas by 5

The PM thinks more of Arkansas than most observers; this will be a good test for them.
#11 tOSU @ Minnesota:  tOSU by 1.5

The Gophers get another shot at a big win.  They played MSU to OT before losing -- can they pull off something against Ohio State?
#25 UCLA @ #21 Colorado:  Colorado by 4

The PM can't factor in the Dinwiddie injury, so this might be closer than predicted.  Or UCLA might end up losing twice this week -- Utah is better than most people know.  (4 losses by a total of 9 points, including an OT loss to then #10 Oregon.)
#22 Pittsburgh @ #2 Syracuse:  Syracuse by 4

Another game where I'd expect a bigger margin.
#25 Oklahoma @ #12 Baylor: Baylor by 7

I'm not sure why either of these teams are ranked, although the PM has Baylor just outside the top 25.  That will change after Baylor loses to Texas Tech earlier in the week.
TOSS-UP OF THE WEEK

#9 Oklahoma St @ #15 Kansas:  <1 point

It's going to be tough for Oklahoma State to win at Kansas.
BLOWOUT OF THE WEEK
TCU @ #9 Oklahoma St: Oklahoma St by 28

The Cowboys pull of the rare two-fer by appearing in both the Toss-Up and the Blowout.

Tuesday, January 7, 2014

Top Twenty (1/6)

1 Oklahoma St. 34.72
2 Louisville 33.68
3 Arizona 33.26
4 Ohio St. 33.08
5 Iowa St. 33.05
6 Arkansas 32.66
7 Kentucky 32.15
8 Iowa 32.14
9 Villanova 32.12
10 Creighton 32.05
11 Duke 31.99
12 Michigan 31.65
13 Gonzaga 31.39
14 Michigan St. 31.27
15 Syracuse 31.26
16 Cincinnati 31.21
17 Wisconsin 31.17
18 Arizona St. 31.15
19 UCLA 30.97
20 Colorado 30.94

Oklahoma State lost ground in the #1 spot but not enough (yet) to fall behind Louisville.  Big movers this week were Gonzaga shooting up from below the Top Twenty to #13 and Arizona St. dropping 8 spots to #18.  Gonzaga still hasn't played any really good teams, but has won its last three games by a minimum of 22 points.    Arizona State meanwhile lost a game to #98 Washington and was penalized accordingly.  Also new to the Top Twenty are Michigan State -- vaulting almost as far as Gonzaga on the basis of the solid win against Indiana -- and UCLA -- being rewarded for demolishing USC and also I give them an extra point (*).   Cincinnati and Wisconsin continue to climb steadily, while Arkansas and Colorado are sinking.
(*) Just kidding.

TOSS-UP OF THE WEEK
#3 tOSU @ #5 Michigan State:  Toss-up

The toss-up of the week is also the first AP Top-25 matchup of the week.  A slight hair of an advantage goes to MSU in what should be a typical BigTennish contest.
#7 Baylor @ #9 Iowa St:  ISU by 12

The PM has ISU just on the edge of the really top-notch teams and Baylor down at #30, so it's no surprise it predicts an easy home victory for ISU.
#23 Illinois @ #4 Wisconsin: Wisc by 9

Illinois beat Indiana last week, so they're the ranked team, but not nearly as good as Wisconsin.
#1 Arizona @ UCLA:  Arizona by 2
The PM believes in UCLA, and home court is going to be worth something, but Arizona should still manage to win this game.
#24 Memphis @ #12 Louisville: Louisville by 18

Memphis's loss to Cincinnati suggests that even at #24 the AP is over-rating them.

Minnesota @ #5 MSU:  MSU by 5

This game might be closer than expected, and MSU may be down if they lost to tOSU.  Go Gophers!
#25 Kansas St @ #18 Kansas: Kansas by 9

Should be a routine win for Kansas, but they have a trap game earlier in the week @ Oklahoma that they might lose.
THE AUBURN CONSOLATION GAME OF THE WEEK
#21 Missouri @ Auburn:  Missou by 2

If Auburn can pull off the upset, it will be some consolation for the Auburn fans grieving about Monday night's loss.
THE BLOWOUT OF THE WEEK
Northwestern @ Iowa: Iowa by 22
Not a lot of blowouts predicted for this week.