6
votes

I am using Apache Spark (Pyspark API for Python) ALS MLLIB to develop a service that performs live recommendations for anonym users (users not in the training set) in my site. In my usecase I train the model on the User ratings in this way:

from pyspark.mllib.recommendation import ALS, MatrixFactorizationModel, Rating
ratings = df.map(lambda l: Rating(int(l[0]), int(l[1]), float(l[2])))
rank = 10 
numIterations = 10
model = ALS.trainImplicit(ratings, rank, numIterations)

Now, each time an anonym user selects an item in the catalogue, I want to fold-in its vector in the ALS model and get the recommendations (just like the recommendProducts() call), but avoiding the re-training of the whole model.

Is there any way to easily do the fold-in of the new anonym user vector after training the ALS model in Apache Spark?

Thanks in advance

1
To make things clear: do you just want to run your model on a test set, or do you want some kind of "model serving" solution to power a web site in real time?Samson Scharfrichter
I definitively want the "model serving" solution to power a web site in real time!Valerio Storch
I'm flagging this question as a duplicate, hopefully it will get more exposure by being linked: stackoverflow.com/questions/30509335/…Chris Snow

1 Answers

0
votes

There are a few Open Source "model server" solutions that I have seen advertised, but have no hands-on experience on. I also heard of a commercial offering, but can't just remember the name right now.
So make your own opinion, and keep a watch on possible alternatives.

PredictionIO (the start-up has been gobbled by SalesForce but their solution is still available) uses a Spark+Hadoop+HBase stack, plus some kind of web server component.

MLeap is yet-another-ML-library-with-limited-feature-set, which can be plugged into Spark/Scikit-Learn/whatever, and can spawn a web service -- or export your model to a hosted solution named Combust.ml

MLDB is yet-another-ML-library-with-limited-feature-set, completely outside of the Python/Spark ecosystem, but claims full integration with TensorFlow -- including the ability to import existing Deep Learning models and tweak them for different uses.