Danny's code differs in a couple of ways from a straightforward batch gradient descent. One difference is that Danny has added in a regularization step. Danny made this comment about regularization:
In addition, I regularize the latent vectors by adding independent zero-mean Gaussian priors (or equivalently, a linear penalty on the squared L2 norm of the latent vectors). This is known to improve these matrix-factorization-like models by encouraging them to be simpler, and less willing to pick up on spurious characteristics of the data.I theorize that with a large, diverse training set such as I'm using, regularization is unnecessary. To test that, I re-ran the PMM without any regularization:
Predictor | % Correct | MOV Error |
---|---|---|
PMM | 71.7% | 11.23 |
PMM (w/o regularization) | 71.8% | 11.20 |
Performance is almost identical, so indeed there doesn't seem to be any value in regularization.
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.