Friday, February 19, 2016

More Kaggle News, ESPN Irritates Me

As a follow-up to this previous post, the Kaggle competition is officially back.  A good deal of data is available, and the forums have been moderately active.  The new Kaggle Notebooks feature is getting some exercise, too:  there are 116 scripts for this competition at the moment, although I'm unclear on what they all are.  There are at least a couple of scripts to calculate ELO ratings and similar things.  Might be worth a look if you're just getting started in this area.

Prizes this year are considerable -- $20K split 10/6/4/3/2.  I suggested awarding prizes for the best performance on each round of the Tournament, but that might have been too hard to implement quickly.  At any rate, spreading the prizes down to 5th place is a good improvement.  The contest is basically random amongst about the top 100 or so contestants, so weighting all the money at the top makes it even more of "random number lottery."

On a completely unrelated note, the NetProphet predictor broke on me last night.  It turned out that ESPN has changed the format of its box scores.  You can see the new format here.  The change seems to have also broken all the past seasons.  If you go to (say) November 2014 the scoreboard and schedule pages will claim that no games were played.

ESPN has been modifying their page formats for a while now, and I was expecting a change at some point.  The scoreboard page had earlier been modified to run from JSON data embedded in the page, and I was expecting to see something similar happen with the box scores and other game pages.  But interestingly enough, although the page formats have changed, they haven't gone to using embedded JSON data on these pages.  That's too bad, because pulling the JSON data out of the page, parsing it and then using it is more straightforward -- and probably a lot more robust -- than pulling data out of the HTML.

