Wednesday, September 16, 2015

Coding Update

After about two weeks of spare-time effort, I've translated about 75% of my core processing into Python.  The result is about 6K SLOC.  The Common Lisp version is about 14K SLOC, but it includes a lot of "dead code" that I edited out while translating and the remaining 25% of functionality.

Most of the translated functions have validated exactly against the Common Lisp originals, but one of the core algorithms is producing different (but still valid looking) numbers.  I spent a little time trying to debug the difference but couldn't find any immediate culprits.  So (after some fits and starts) I processed the entire historical game database through the new Python code and used that to train a model.  The model had the same performance and accuracy as the model built off the Common Lisp processed data, so apparently the differences are irrelevant to the model.  I suspect the Python version is probably "more correct" than the Common Lisp version, because this is a matrix-manipulation heavy part of the code, and it is expressed much more succinctly and clearly in Python.

At some point I'm going to have to decide whether I want to carry the Play-By-Play processing code forward.  I spent a lot of time on the (original) code, and you can pull some interesting data out of the Play-By-Play.  On the other hand, the coverage is poor (especially before about 2012) and the data is full of errors.  Many (most?) games have statistics that don't agree between the Play-By-Play data and the box score.  A surprising number of games have different final scores.  So it's hard to put a lot of faith in the data.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.