Danny's code differs in a couple of ways from a straightforward batch gradient descent. One difference is that Danny has added in a regularization step. Danny made this comment about regularization:
In addition, I regularize the latent vectors by adding independent zero-mean Gaussian priors (or equivalently, a linear penalty on the squared L2 norm of the latent vectors). This is known to improve these matrix-factorization-like models by encouraging them to be simpler, and less willing to pick up on spurious characteristics of the data.I theorize that with a large, diverse training set such as I'm using, regularization is unnecessary. To test that, I re-ran the PMM without any regularization:
|Predictor||% Correct||MOV Error|
|PMM (w/o regularization)||71.8%||11.20|
Performance is almost identical, so indeed there doesn't seem to be any value in regularization.