## Friday, July 22, 2011

### Possessions / Game

As mentioned in my previous post, I've been working lately on a game data set that includes game statistics like Field Goal Percentage, Offensive Rebounds, etc.  After some mucking about, I've imported all the data from my web scraping as well as the scraping done by Lee-Ming over at This Number Crunching Life.  Neither of us have the data from after the start of the Tournament last year, so I'm bugging Lee-Ming to scrape up that data and make his archive complete.  There's also about 10-20% of the games that are missing, either because the scrapers failed to pick them up, or because the Yahoo Sports! statistics were incomplete.  Those problems aside, I have about 11K games in the training set, so still plenty to work with.

Apart from reading in the basic stats, the first thing I looked at was calculating the number of possessions in the game.  Possessions isn't a collected statistic, so we have to derive it from the statistics we have.  Unfortunately, it isn't possible to calculate the number exactly, so I'm using this formula:
Possessions = FGA - OReb + 0.475*FTA + TO
which I've stolen from Ken Pomeroy.   The "0.475" accounts for the fact that the first free throw of a "shooting two" situation doesn't end the possession.  (Ken Pomeroy is usually credited for research indicating that 47.5% of free throws end possessions, but I've never been able to find that research.)  We know that each team in a game should have the same number of possessions (+/- one possession), so we calculate possessions for both teams, average, and use that number.

We care about possessions because some statistics can only be interpreted in light of how many possessions a team had in the game.  For example, suppose a team grabbed 10 offensive rebounds.  That's a good performance in a game with 50 possessions; not so good in a game with 100 possessions.

Here are the stats for possessions over the games in the training set:

AverageMaximum Minimum
Possessions 6812146

That maximum of 121 piqued my interest, so I looked into the data and saw it was the 3/12/09 game between Connecticut and Syracuse -- a game that went to 6 overtimes.  That points out one shortcoming in my game data -- I'm not capturing OTs or the number of minutes played.  Hopefully that won't corrupt the value of the data too much.

(The #2 game was CSU-Fullerton vs. CSU Northridge on 2/13/10 -- a triple OT game with 108 possessions.  At the other end of the scale, Denver and LA Monroe played the 46 possession game on 1/11/09.  Interestingly, they were fairly efficient and scored a total of 105 points.  Illinois and Penn St. played a 57 possession game on 2/18/09 and only managed 71 total points -- a game that will no doubt be mentioned in some future post on offensive efficiency. :-)