(Of course, there are a lot of complicating factors. Some teams intentionally take long or short possessions. And possessions that start with an offensive rebound are likely to be shorter than ones that don't. And so on. But I did say this was a simple model.)

A "possession" is typically defined as a period during which one team or the other continuously controls the ball. The traditional way to calculate the number of possessions in a game is to use a formula that looks something like this (there are several widely used variants):

Possessions = FGA - Oreb + 0.475*FTA + TOIf you look at the first two terms, you'll see that if a team has the ball, makes a shot attempt, gets the rebound and makes another shot attempt, that will equate to one possession, because the offensive rebound will "cancel out" the first shot attempt.

However, I'm interested in a different definition of possession -- one that corresponds with a team receiving a fresh shot clock. This equates to the number of times a team "runs it's offense". With this definition, if a team has the ball, makes a shot attempt, gets the rebound and makes another shot attempt, that will equate to

*two*possessions.

After some experimentation, a fairly good equation for estimating that number seems to be:

Possessions = FGA + 0.666*FTA + TO + 3(This is based upon counting the possessions from play-by-play data and performing a linear regression for games from the 2013 season.)

If you calculate the number of possessions for each team, you can then calculate the average length of possession for the game, but you cannot determine the average length of possession for each team. To do that, you need to analyze the play-by-play data. Fortunately, the ESPN Scoreboard provides play-by-play data for the majority of games, and the format is fairly standardized.

I've spent the last couple of weeks figuring out how to scrape the play by play data and analyze it to determine change-of-possession and average length of possession. Whether these statistics are useful for prediction remains to be seen.

Your formula should be the same as the first one, just without offensive rebounds. You shouldn't have a constant or a different coefficient on FTA.

ReplyDeleteYeah, Ken, I'm aware of that, but when I do a linear regression to find the most accurate formula, that's what pops out. The 0.475 is just a magic constant that someone came up with at some point (I've never been able to find a definitive source) so I'm not surprised that I get a different constant. (And in fact, if I try to recreate the first formula, I don't get 0.475.) The constant is more puzzling, but one factor is that someone starts off each half/OT with a possession that won't show up otherwise. And I'm not sure about how alternating possessions are treated in the play-by-play -- I haven't off-hand found any indication in the PBPs I've looked at.

ReplyDeleteAnd of course, it's always possible that I have problems in the code that interprets the Play-by-Play.

The .475 isn't magic. Each possession can end in only 3 ways: the team turns the ball over, the team takes a shot, or the team shoots free throws. The team can shoot 1 free throw (And 1 or if they miss the front end of a 1-and-1), 2 free throws, or 3 free throws. In the And 1 case, the free throw should not be counted because the shot counts. If the miss the first of a 1-and-1, that FT counts. If they shoot 2 or 3, just 1 of the FTs count. All of those together comes out to about .475.

ReplyDeleteAlternating possession should be a turnover if the defense gets the ball.

My guess is your count from the PBP is slightly off.

When I said 0.475 was a magic constant I just meant that it was empirically determined from the ratio of free throw situations, so I wasn't surprised if I didn't get the same number.

ReplyDeleteI did find and fix the jump ball data, but I haven't had a chance to re-run the regression yet.