Wednesday, September 23, 2015

Calculating the Massey Rating, Part 2

I've gotten consumed in the last few days with a bug in the predictor, and I forgot I was intending to post the second part of this notebook. The good news is that in the meantime I figured out how to make these notebooks play better with Blogger.

In this posting I'll expand the basic Massey rating with a few additions. First, let's recover the game list and the definitions from the previous posting.

In [70]:
games = [[0, 3, 3], [1, 4, 21], [1, 5, 5], [2, 0, 13], [2, 3, 13],
         [3, 0, 25], [4, 5, 8], [4, 3, 15], [5, 1, 21], [5, 4, 8]]

import numpy as np

def buildGamesMatrix(games, num_teams):
    M = np.zeros([len(games), num_teams])
    row = 0
    for g in games:
        M[row, g[0]] = 1
        M[row, g[1]] = -1
        row += 1
    return M

M = buildGamesMatrix(games,6)

def buildOutcomes(games):
    E = np.zeros([len(games)])
    row = 0
    for g in games:
        E[row] = g[2]
        row += 1
    return E

E = buildOutcomes(games)

bh = np.linalg.lstsq(M,E)[0]
print bh
[-18.55833333  11.74166667  -1.15833333  -9.75833333   5.24166667
  12.49166667]

One point that Massey makes in his thesis is that it is sometimes impossible to solve this set of equations. This occurs when the teams are divided into two or more subsets that have never played each other. (In math terms, we can only solve the equations when the graph of teams is connected.) In practice this isn't a problem for me, because I don't start calculating ratings until about 800 games into the season, and by that time the graph is connected. But it can be a problem in a sport like football, so how do we address this?

Massey recommends forcing the graph to be connecting by arbitrarily changing one row of $M$ to be all ones, and the corresponding element of $E$ to zero. For example:

In [71]:
print "\nBefore forcing connectivity:\n"
print M
print 
print E
M[9:] = 1.0
E[9] = 0.0
print "\nAfter forcing connectivity: \n"
print M
print
print E
Before forcing connectivity:

[[ 1.  0.  0. -1.  0.  0.]
 [ 0.  1.  0.  0. -1.  0.]
 [ 0.  1.  0.  0.  0. -1.]
 [-1.  0.  1.  0.  0.  0.]
 [ 0.  0.  1. -1.  0.  0.]
 [-1.  0.  0.  1.  0.  0.]
 [ 0.  0.  0.  0.  1. -1.]
 [ 0.  0.  0. -1.  1.  0.]
 [ 0. -1.  0.  0.  0.  1.]
 [ 0.  0.  0.  0. -1.  1.]]

[  3.  21.   5.  13.  13.  25.   8.  15.  21.   8.]

After forcing connectivity: 

[[ 1.  0.  0. -1.  0.  0.]
 [ 0.  1.  0.  0. -1.  0.]
 [ 0.  1.  0.  0.  0. -1.]
 [-1.  0.  1.  0.  0.  0.]
 [ 0.  0.  1. -1.  0.  0.]
 [-1.  0.  0.  1.  0.  0.]
 [ 0.  0.  0.  0.  1. -1.]
 [ 0.  0.  0. -1.  1.  0.]
 [ 0. -1.  0.  0.  0.  1.]
 [ 1.  1.  1.  1.  1.  1.]]

[  3.  21.   5.  13.  13.  25.   8.  15.  21.   0.]

Now we can re-run our ratings with this new $M$.

In [72]:
bh = np.linalg.lstsq(M,E)[0]
print bh
[-18.43333333  11.56666667  -1.03333333  -9.63333333   5.36666667
  12.16666667]

As you can see, this has (not unsurprisingly) changed the ratings. In addition to making the graph connected, this change also tries to make the mean of the ratings equal to 0.

I prefer a variant approach which adds the connectivity fix as a new "game" instead of overwriting the last game. This avoids impacting the ratings.

In [73]:
M = buildGamesMatrix(games,6)
E = buildOutcomes(games)
M = np.vstack((M, [1, 1, 1, 1, 1, 1]))
print M
E = np.append(E, 0)
print E
[[ 1.  0.  0. -1.  0.  0.]
 [ 0.  1.  0.  0. -1.  0.]
 [ 0.  1.  0.  0.  0. -1.]
 [-1.  0.  1.  0.  0.  0.]
 [ 0.  0.  1. -1.  0.  0.]
 [-1.  0.  0.  1.  0.  0.]
 [ 0.  0.  0.  0.  1. -1.]
 [ 0.  0.  0. -1.  1.  0.]
 [ 0. -1.  0.  0.  0.  1.]
 [ 0.  0.  0.  0. -1.  1.]
 [ 1.  1.  1.  1.  1.  1.]]
[  3.  21.   5.  13.  13.  25.   8.  15.  21.   8.   0.]

And now we'll calculate the ratings again.

In [74]:
bh = np.linalg.lstsq(M,E)[0]
print bh
[-18.55833333  11.74166667  -1.15833333  -9.75833333   5.24166667
  12.49166667]

Now the connectivity fix doesn't affect the ratings. (In this case, anyway. If the matrix wasn't well-connected, or the ratings didn't already sum to zero, it would still have an effect.)

Moving on, you may have noticed in the previous notebook about the Massey rating that there was no mention of whether teams were playing at home or away. Games were expressed as a winner, a loser and a margin of victory. So let's add home & away. To start with, we'll modify our game representation so that instead of

$[Winner, Loser, Margin]$

we have

$[Home, Loser, MOV]$

where $MOV$ is the more familiar definition of the score of the home team minus the score of the away team. I've modified the games array from the previous posting to make a few games wins by the Away team. Since I choose to put Home first in the new representation, and changed Margin to MOV, nothing actually changes in the code. So let's calculate the ratings for this new set of games.

In [80]:
games = [[0, 3, 3], [1, 4, 21], [1, 5, -5], [2, 0, 13], [2, 3, 13],
         [3, 0, 25], [4, 5, -8], [4, 3, 15], [5, 1, 21], [5, 4, -8]]
M = buildGamesMatrix(games,6)
E = buildOutcomes(games)
M = np.vstack((M, [1, 1, 1, 1, 1, 1]))
E = np.append(E, 0)

bh = np.linalg.lstsq(M,E)[0]
print bh
[-18.35   9.45  -0.95  -9.55   5.45  13.95]

So we now have Home/Away represented and we have forced connectivity. Another issue we might want to address is that the Massey rating system doesn't have the notion of a "home court advantage". You can see that it predicts the same game outcome for $Team_i$ versus $Team_j$ regardless of whether $Team_i$ is Home or $Team_j$ is Home.

One way we can try to address that shortcoming is to add a HCA constant to the expected outcome equation, like so:

$e_{ha} = R_{home} - R_{away} + C_{HCA}$

Here $C_{HCA}$ is a constant for all games. We can represent this as a new column in $M$ of all 1s (because it affects every game), except in the connectivity row (Do you see why? $C_{HCA}$ captures a bias in scoring data, so we don't want to force it to sum to zero with the ratings.).

In [81]:
M = buildGamesMatrix(games,6)
E = buildOutcomes(games)
# Adjust for connectivity
M = np.vstack((M, [1, 1, 1, 1, 1, 1]))
E = np.append(E, 0)
# Calculate rating & error before HCA
bh = np.linalg.lstsq(M,E)[0]
print "Error rating without HCA:\n"
print sum(map(abs, M.dot(bh)-E))/10
# Adjust for HCA
M = np.hstack((M, [[1], [1], [1], [1], [1], [1], [1], [1], [1], [1], [0]]))
print "\nM with HCA column:\n"
print M
bh = np.linalg.lstsq(M,E)[0]
print "\nRatings with new HCA:\n"
print bh
Error rating without HCA:

8.78

M with HCA column:

[[ 1.  0.  0. -1.  0.  0.  1.]
 [ 0.  1.  0.  0. -1.  0.  1.]
 [ 0.  1.  0.  0.  0. -1.  1.]
 [-1.  0.  1.  0.  0.  0.  1.]
 [ 0.  0.  1. -1.  0.  0.  1.]
 [-1.  0.  0.  1.  0.  0.  1.]
 [ 0.  0.  0.  0.  1. -1.  1.]
 [ 0.  0.  0. -1.  1.  0.  1.]
 [ 0. -1.  0.  0.  0.  1.  1.]
 [ 0.  0.  0.  0. -1.  1.  1.]
 [ 1.  1.  1.  1.  1.  1.  0.]]

Ratings with new HCA:

[-12.86923077   4.54615385  -2.39230769  -4.06923077   4.00769231
  10.77692308   6.92307692]

$C_{HCA}$ is the last term in $bh$ and you can see that in this example it is quite substantial. Did adding this make our rating system any more accurate? Let's calculate the error with $C_{HCA}$:

In [83]:
sum(map(abs, M.dot(bh)-E))/10
Out[83]:
7.987692307692309

So yes, in this case adding $C_{HCA}$ to our model resulted in an improvement of about 0.8 in our MAE.

As a last note, if you read Massey's thesis you'll see he's actually solving a set of equations that look like this:

$M^T * E = M^T * M*R$

which is to say that he's multiplied both sides by $M^T$ (the transpose of $M$). This has the effect of reducing the number of equations from the number of games to the number of teams, as you can see here:

In [86]:
print "M is 11 rows long:\n"
print M
print "\nMt*M is 6 rows long:\n"
print M.T.dot(M)
M is 11 rows long:

[[ 1.  0.  0. -1.  0.  0.  1.]
 [ 0.  1.  0.  0. -1.  0.  1.]
 [ 0.  1.  0.  0.  0. -1.  1.]
 [-1.  0.  1.  0.  0.  0.  1.]
 [ 0.  0.  1. -1.  0.  0.  1.]
 [-1.  0.  0.  1.  0.  0.  1.]
 [ 0.  0.  0.  0.  1. -1.  1.]
 [ 0.  0.  0. -1.  1.  0.  1.]
 [ 0. -1.  0.  0.  0.  1.  1.]
 [ 0.  0.  0.  0. -1.  1.  1.]
 [ 1.  1.  1.  1.  1.  1.  0.]]

Mt*M is 6 rows long:

[[  4.   1.   0.  -1.   1.   1.  -1.]
 [  1.   4.   1.   1.   0.  -1.   1.]
 [  0.   1.   3.   0.   1.   1.   2.]
 [ -1.   1.   0.   5.   0.   1.  -2.]
 [  1.   0.   1.   0.   5.  -1.   0.]
 [  1.  -1.   1.   1.  -1.   5.   0.]
 [ -1.   1.   2.  -2.   0.   0.  10.]]

Applying the same operation to both sides of the equation doesn't change the answer, so you get the same result if you solve the new formulation.

In [87]:
bh = np.linalg.lstsq(M.T.dot(M),M.T.dot(E))[0]
print "\nRatings with new formulation:\n"
print bh
Ratings with new formulation:

[-12.86923077   4.54615385  -2.39230769  -4.06923077   4.00769231
  10.77692308   6.92307692]

The advantage (at least in 1997 when Massey wrote his thesis) was that it was easier/faster to solve the smaller set of equations. I'm not sure if that's still true and it might be that Python's linear solver is smart enough to do this itself. But you'll get the same result either way.

In [ ]:
 

2 comments:

  1. Look at you and your fancy Jupyter notebook self! (Note, the cool kids call it Jupyter these days, not IPython). Looks great on my end, I think you got the layout sorted.. Glad to see the Python conversion is going well! I'm curious how you find it vs your original Lisp code? I figured between Pandas and NumPy you have to be pretty happy with the ease of a lot of things (though admittedly I know very little about Lisp, and what similar functionality might exist). Did you track down the bug in your predictor yet?

    ReplyDelete
  2. Thanks, Brandon! I edited the CSS code to take out all the fixed "width:" attributes and that fixed the formatting problem, although I noticed that it also wiped out the header at the top of the blog (!). Oh well, good enough.

    As far as the bug goes, it's really puzzling. I add a new attribute to my model that's just two existing attributes multiplied together. I build a model with that and the error for certain years shoots through the roof, i.e., it goes from something like 11 to (say) 4000. But inspection shows nothing particularly wonky with the new attribute. Normalizing the inputs to the model mostly fixes the problem, but I can't see why since the new attribute is in the same range as the existing attributes.

    At any rate it was more a curiosity than anything and I've moved on :-)

    ReplyDelete