Wednesday, September 21, 2011

Statistical Prediction: Pace-Adjusted Statistics & The Four Factors

There is much talk in sports statistics circles about pace-adjusted statistics.  As Wikipedia puts it:
A key tenet for many modern basketball analysts is that basketball is best evaluated at the level of possessions.
The notion here is that because teams play at different paces, game-level statistics can be misleading.  A team that averages 95 points per game is not necessarily better than one that averages 78 points per game.  The higher-scoring team may simply be playing at a much faster pace.  We can account for this by measuring statistics per possession rather than per game.

While this makes a lot of intuitive sense, I always like to test my intuitions.  So I took the same set of statistics used in this posting and re-calculated them as per-possession statistics.  (See here for how to estimate the number of possessions in a game.)  Then I ran the prediction model using the per-possession statistics.  (Obviously some statistics, like "Field Goal Shooting Percentage" are not calculated on a per-game basis, so those don't get pace-adjusted.)  Here is the performance comparison:

  Predictor    % Correct    MOV Error  
Govan + Averaging73.5%10.80
Statistical prediction (per-game stats)72.2%11.09
Statistical prediction (per-possession stats) 72.2%11.10

As you can see, the two approaches were indistinguishable.  Not only was performance nearly identical, but they both selected the same statistics for the prediction model.  So at least for this case, it doesn't appear that adjusting for pace improves performance.

My guess is that the relative unimportance of pace is due to the shot clock and the copycat nature of coaching.  There probably isn't enough pace variation across teams to make it a significant factor.

If you search around for "pace-adjusted statistics" you'll eventually stumble across Ken Pomeroy's Four Factors page.  The four factors are derived statistics that are intended to give additional insight into how teams play.  The factors are:
  • Effective field goal percentage
  • Turnover percentage
  • Offensive rebounding percentage
  • Free throw rate   
(Definitions can be found on Ken Pomeroy's page.)

"Effective FG%" is not of interest to me because the linear regression can adjust the relative importance of field goals versus three-point attempts.  "Turnover %" is turnovers per possession; that's one of the statistics I calculated as part of the per-possession statistics experiment above.  (It had no value in the predictor, fwiw.)  "Offensive rebounding %" is a more interesting statistics, and since offensive rebounds are used by the statistical prediction model, this seems like a worthwhile statistics to investigate.  "Free throw rate" seems to capture some notion about how often a team draws a foul.  I think that's already captured, but it isn't difficult to generate this statistic.

If I generate these two new statistics and run the prediction model, I find that performance remains the same, but the "Offensive rebounding %" statistics replace the per-game or per-possession offensive rebounding statistics.  ("Free throw rate" has no predictive value and is eliminated in the linear regression.)

Since three point shooting percentages are used in the predictor, I decided to define a new statistic to capture how much a team relies on the three-point shot (and impacts its opponents use of the three-point shot).  I defined this as:
Offensive Balance = (# 3 Pt Attempts) / (# FG Attempts)
and re-ran the predictor.  The new statistic has no predictive value.  An alternative formulation is to look at the made 3 pointers versus the made field goals:

Offensive Balance = 3*(# 3 Pt Made) / 2*(# FG Attempts)
but again, this statistic has no predictive value.

I'm open to suggestions if anyone out there has any thoughts on similar "derived statistics" that might be of value in prediction.

No comments:

Post a Comment