0
votes

I am going through the Mahout in Action book and trying out the distributed recommendation engine example. But instead of the Wikipedia dataset I am using a small dataset for my example.

input.txt->

1,15
1,13
1,12
1,10
2,11
2,15
2,20
2,17
2,12
3,10

user.txt->

3

where input.txt and user.txt is of the form user,item and input.txt represent the items user has purchased in the past and user.txt represent the current purchases

When I copy these 2 files in hdfs and run

hadoop jar mahout-core-0.5-job.jar \ org.apache.mahout.cf.taste.hadoop.item.RecommenderJob \ -Dmapred.input.dir=input/input.txt \ -Dmapred.output.dir=output --usersFile input/user.txt --booleanData

The map reduce runs properly. However when I check the output in bin/hadoop fs -cat output/ part-r-00000

I find an empty file.

Can someone explain me what’s going wrong? If I were to understand correctly the Recommender Job should have built an item to item similarity matrix, multiply it with the user-item matrix(from user.txt) and produce the result.

Need some help understanding. I am using Mahout 0.5 and hadoop 1.2 on a single node. I hope its not an issue of version compatibility.

EDIT

I get an answer if I change the user.txt to

2

or

1

1
Could you post here the code that you are using? Maybe you are using an invalid DataModel...Alessandro Suglia
I dont think its a problem with the model. Check my edits.Abhiroop Sarkar
Maybe the user 3 has insufficient preferences and the program is unable to make recommendation for him?Alessandro Suglia
Well yes like you see in the file the user has only one preference. However even for 1 transaction the user-item matrix should have 1 entry and multiplication with the item item similarity matrix should give some result. I am not sure I quite understand how the internals work.Abhiroop Sarkar
@AbhiroopSarkar: you forgot to vote/accept or give some kind of feedback to the existing answer.tokland

1 Answers

1
votes

First of all use Mahout 0.9 or the current source build. 0.5 is very old and outdated. I know it's used in the book but many of the examples will still work with newer code.

Second you have very few cooccurrences in your data. If you want to understand how a recommender works try this blog post Using such small datasets can easily produce no cooccurrences, which will result in empty recommendations. In the post there is a very small data set that is designed to produce some recs but will not produce recs for all users.

Third make sure to use Mahout IDs for all items and users. That means row and column numbers in a user x item matrix. They must be 0 to number-of-items-minus-one for item IDs and 0 to number-of-users-minus-one for user IDs. Using anything else will cause spurious results. This restriction has been removed for several of the Mahout 1.0 Spark jobs where you can use any unique string. But the Hadoop mapreduce code still expects these IDs