3
votes

I am implementing matrix factorization to predict a movie rating by a reviewer. The dataset is taken from MovieLen (http://grouplens.org/datasets/movielens/). This is a well-studied recommendation problem so I just implement this matrix factorization method as for my learning purpose.

I model the cost function as a root-mean-square error between predict rating and actual rating in the training dataset. I use scipy.optimize.minimize function (I use conjugate gradient descent) to factor the movie rating matrix, but this optimization tool is too slow even for only a dataset with 100K items. I plan to scale my algorithms for the dataset with 20 million items.

I have been searching for a Python-based solution for Stochastic Gradient Descent, but the stochastic gradient descent I found on scikit-learn does not allow me to use my custom cost and gradient functions.

I can implement my own stochastic gradient descent but I am checking with you guys if there exists a tool for doing this already.

Basically, I am wondering if there is such as API that is similar to this:

optimize.minimize(my_cost_function,
                  my_input_param,
                  jac=my_gradient_function,
                  ...)

Thanks! Un

2
There are two things you should look at: (1) Whether or not the matrix libraries are vectorized/parallelized and (2) The convergence of your gradient step size. Plot the cost function versus iteration to see if step size can make it go faster. You might be taking too many small steps to converge to a solution.duffymo
thanks for the reply. Good point on the step size.suthee

2 Answers

1
votes

This is so simple (at least the vanilla method) to implement that I don't think there is a "framework" around it. It is just

my_input_param += alpha * my_gradient_function

Maybe you want to have a look at theano, which will do the differentiation for you, though. Depending on what you want to do, it might be a bit overkill, though.

1
votes

I've been trying to do something similar in R but with a different custom cost function.

As I understand it the key is to find the gradient and see which way takes you towards the local minimum.

With linear regression (y = mx + c) and a least squares function, our cost function is (mx + c - y)^2 The partial derivative of this with relation to m is 2m(mX + c - y) Which with the more traditional machine learning notation where m = theta gives us theta <- theta - learning_rate * t(X) %*% (X %*% theta - y) / length(y)

I don't know this for sure but I would assume that for linear regression and a cost function of sqrt(mx + c - y) that the gradient step is the partial derivative with relation to m, which I believe is m/(2*sqrt(mX + c - y))

If any/all of this is incorrect please (anybody) correct me. This is something I am trying to learn myself and would appreciate knowing if I'm heading in completely the wrong direction.