I have been working with Mahout to create a recommendation engine based on the following data:
- 100k users
- 10k items
- 4M ratings
I'm running it on a Tomcat with the following JVM arguments :
-Xms1024M -Xmx1024M -da -dsa -XX:NewRatio=9 -server
Recommendations took about 6s, it seems slow ! How could I improve Mahout performances ?
I'm using the following code :
This part is run once at startup :
JDBCDataModel jdbcdatamodel = new MySQLJDBCDataModel(dataSource);
dataModel = new ReloadFromJDBCDataModel(jdbcdatamodel);
ItemSimilarity similarity = new CachingItemSimilarity(new EuclideanDistanceSimilarity(model), model);
SamplingCandidateItemsStrategy strategy = new SamplingCandidateItemsStrategy(10, 5);
recommender = new CachingRecommender(new GenericItemBasedRecommender(model, similarity, strategy, strategy));
And, for every user request I do :
recommender.recommend(userId, howMany);
ReloadFromJDBCDataModel
loads datamodel from the database into memory so this takes time only once, or I'm missing something ? – Thibaud