Calculating the Massey Rating¶
In this posting, I'll demonstrate how to calculate the Massey rating as described in (Massey 1997), which you can download from Papers. (His current rating system may be somewhat different.) I was a little surprised to see that I haven't ever written about the Massey ratings, so this will be a chance to do that. This is also an opportunity for me to experiment with creating an iPython Notebook!
You can read the details of the Massey rating system in his paper, but the basic notion is that the expected outcome of a game (in terms of Margin of Victory) is equal to the difference in ratings between the two teams:
After a few games, we have some actual outcomes and we can use these to calculate the team ratings. If we think of every actual game as equation with a value for $e$ (the actual margin by which $T_i$ won the game), then we have a set of linear equations with a variable ($R_i$) for each team. Once we have more games than teams we can solve for $R_i$. In the rest of this notebook I'll walk through how to do this in Python.
The first thing we'll need is some sample data for testing. In this case I'm going to have 10 games for 6 teams. I'll use a simple representation of a game as a list of three numbers:
games = [[0, 3, 3], [1, 4, 21], [1, 5, 5], [2, 0, 13], [2, 3, 13],
[3, 0, 25], [4, 5, 8], [4, 3, 15], [5, 1, 21], [5, 4, 8]]
First, let's initialize an array of zeros of the right size.
import numpy as np
M = np.zeros([10,6])
print M
def buildGamesMatrix(games,M):
row = 0
for g in games:
M[row, g[0]] = 1
M[row, g[1]] = -1
row += 1
return M
buildGamesMatrix(games, M)
print M
def buildGamesMatrix(games, num_teams):
M = np.zeros([len(games), num_teams])
row = 0
for g in games:
M[row, g[0]] = 1
M[row, g[1]] = -1
row += 1
return M
M = buildGamesMatrix(games,6)
print M
def buildOutcomes(games):
E = np.zeros([len(games)])
row = 0
for g in games:
E[row] = g[2]
row += 1
return E
E = buildOutcomes(games)
print E
Together $M$ and $E$ represent a set of linear equations. Now how do we solve these equations to get the team ratings?
Generally speaking, a set of linear equations like this won't have an exact solution. But we can find the closest approximation through a process called "linear least squares". What this does is find the linear equation that minimizes the total error for all the data points. There are a number of ways to do this in various Python libraries, but one straight forward way is to use the least squares solver in Numpy:
bh = np.linalg.lstsq(M,E)[0]
print bh
How good are these ratings? We can test this by looking at how well the ratings predicted the actual outcomes. For example, in our first game, $T_0$ played $T_3$ and won by 3 points. According to our model, the expected outcome was $R_0 - R_3$ which is $-18.55 - -9.75$ or about -8.8 points. So in this case, the prediction was off by 11 points (!). We can find the Mean Absolute Error by doing this calculation for every game and then dividing by the number of games:
sum(map(abs, M.dot(bh)-E))/10
M.dot(bh)
puts the calculated ratings bh
into the system of linear equations M
to calculate the predicted WINs. Then the actual WINs E
are subtracted to determine the error. The map
takes the absolute values of the errors, and the sum
and /10
calculate the mean. The result is a MAE of about 9.3 points.
How can we use the ratings to make predictions? If we want to predict the outcome when $T_1$ plays $T_4$ we go back to our equation:
We can do also do this directly. First we have to create a new matrix representing the game (or games) we want to predict. Suppose that we want to predict a game between $T_0$ and $T_5$. We create a corresponding matrix, and (arbitrarily) mark $T_0$ as the winner and $T_5$ as the loser:
P = np.array([[1, 0, 0, 0, 0, -1]])
bh
to get a prediction:
print P.dot(bh)
If we made our prediction matrix the other way:
P = np.array([[-1, 0, 0, 0, 0, 1]])
print P.dot(bh)
This covers the basic notion of the Massey ratings, but there are a few subtleties that I'll go over in the next installment.
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.