DataModel usage with FileItemSimilarity in Mahout

Question

I'm building a recommender where the actual similarity computation is done with the ItemSimilarityJob and which is then loaded into a non distributed recommender through FileItemSimilarity.

All this works so far(2), but there's one thing I'm a bit puzzled about.

When instantiating the recommender (GenericItemBasedRecommender), I've to pass along a data model - which would be FileDataModel in my case, but due to the fact that the similarity computation already took place, I don't really know what data I should pass into the model?

Clearly the model is used to determine maximum and minimum preference value and item- and user-ids. Regarding the users I'm planning to have only anonymous "profiles" anyways - so would it then be ok to pass along fake data?

How's that supports to work - the Mahout examples (1) and the MiA book don't give any answers on that but both state that pre-computation is the way to go :(

(1) I'm running on Mahout 0.7 but also looked into trunk already.

(2) I had to transfer the generated similarity matrix into a textual format myself of course.

Sean Owen Sean Owen · Accepted Answer · 2013-06-30T17:29:48

You should pass the same DataModel that was fed to the similarity computation. The recommender's output is certainly a function of the similarities, but, also the original data of course! That's why it's an input.

You could in theory build similarities off a different DataModel than the data you are actually making recommendations from. It's possible and might make sense in some cases but is not normal.

DataModel usage with FileItemSimilarity in Mahout

1 Answers