Apart from reading in the basic stats, the first thing I looked at was calculating the number of possessions in the game. Possessions isn't a collected statistic, so we have to derive it from the statistics we have. Unfortunately, it isn't possible to calculate the number exactly, so I'm using this formula:
Possessions = FGA - OReb + 0.475*FTA + TOwhich I've stolen from Ken Pomeroy. The "0.475" accounts for the fact that the first free throw of a "shooting two" situation doesn't end the possession. (Ken Pomeroy is usually credited for research indicating that 47.5% of free throws end possessions, but I've never been able to find that research.) We know that each team in a game should have the same number of possessions (+/- one possession), so we calculate possessions for both teams, average, and use that number.
We care about possessions because some statistics can only be interpreted in light of how many possessions a team had in the game. For example, suppose a team grabbed 10 offensive rebounds. That's a good performance in a game with 50 possessions; not so good in a game with 100 possessions.
Here are the stats for possessions over the games in the training set:
That maximum of 121 piqued my interest, so I looked into the data and saw it was the 3/12/09 game between Connecticut and Syracuse -- a game that went to 6 overtimes. That points out one shortcoming in my game data -- I'm not capturing OTs or the number of minutes played. Hopefully that won't corrupt the value of the data too much.
(The #2 game was CSU-Fullerton vs. CSU Northridge on 2/13/10 -- a triple OT game with 108 possessions. At the other end of the scale, Denver and LA Monroe played the 46 possession game on 1/11/09. Interestingly, they were fairly efficient and scored a total of 105 points. Illinois and Penn St. played a 57 possession game on 2/18/09 and only managed 71 total points -- a game that will no doubt be mentioned in some future post on offensive efficiency. :-)