1
votes

I've tried Item based distributed Recommender Job with 10M movielens dataset. Everything works fine. My question is that, while checking out the recommendations for users, I've seen that in the recommended items list there are items that are already rated by the user. To be more specific:

Let say a user with userid:4 has watched movies with the following ids:[123,543,234,567,324], then in the recommended list again 543 and 234 are present. I just looked Mahout in Action to understand the algorithm, I could not find a code segment that eliminates already rated items before it produces TopK list. Do I miss something, or is it normal that it recommends already rated items?

If it is normal is it possible to eliminate those items from candidate items?

P.S: Filtering out the recommendations after recommendations are produced is not an efficient for my case, since number of recommendations I want is 100 and after filtering for some users this number decreases to 30 etc.

Thanks in advance.

3

3 Answers

3
votes

The code has changed a lot since I first made it, and there are several RecommenderJobs, but originally there was a phase that added a "(user,item,NaN)" tuple to the final vector sum for all existing user-item pairs. This caused the sum to be NaN for all such user-item pairs and could be excluded from the result. It may not be in there anymore.

1
votes

I'm one of the authors of RecommenderJob. We have unit-tests that explicitly check that users are not recommended items they already know. If this really happens, it would be a serious bug. Can you give an example of input data where you see this happen?

It would also be better to move this discussion to the mahout mailinglist at https://cwiki.apache.org/confluence/display/MAHOUT/Mailing+Lists,+IRC+and+Archives

0
votes

In the source code of recommenderjob:

addOption("filterFile", "f", "File containing comma-separated userID,itemID pairs. Used to exclude the item from " + "the recommendations for that user (optional)", null);

I think it can solve your problem by feeding this file to the recommenderjob.