Mahout Datamodel with duplicate user,item enteries but different preference values

Question

I was wondering how the distributed mahout recommender job org.apache.mahout.cf.taste.hadoop.item.RecommenderJob handled csv files where duplicate and triplicate user,item entries exist but with different preference values. For example, if I had a .csv file that had entries like

1,1,0.7
1,2,0.7
1,2,0.3
1,3,0.7
1,3,-0.7

How would Mahout's datamodel handle this? Would it sum up the preference values for a given user,item entry (e.g. for user item 1,2 the preference would be (0.7 + 0.3)), or does it average the values (e.g. for user item 1,2 the preference is (0.7 + 0.3)/2) or does it default to the last user,item entry it detects (e.g. for user 1,2 the preference value is set to 0.3).

I ask this question because I am considering recommendations based on multiple preference metrics (item views, likes, dislikes, saves to shopping cart, etc.). It would be helpful if the datamodel treated the preference values as linear weights (e.g. item views plus save to wish list has higher preference score than item views). If datamodel already handles this by summing, it would save me the chore of an additional map-reduce to sort and calculate total scores based on multiple metrics. Any clarification anyone could provide on mahout .csv datamodel works in this respect for org.apache.mahout.cf.taste.hadoop.item.RecommenderJob would be really appreciated. Thanks.

seems like, this can be solved by using R implementation of K Means algorithm. Just wanted to share the info. — Swamy

Sean Owen Sean Owen · Accepted Answer · 2013-05-17T15:48:49

No, it overwrites. The model is not additive. However the model in Myrrix, a derivative of this code (that I'm commercializing) has a fundamentally additive data modet, just for the reason you give. The input values are weights and are always added.

Mahout Datamodel with duplicate user,item enteries but different preference values

2 Answers