2
votes

I have tested the user based recommendations with apache mahout and it is working well with the sample data provided.

However, I have my own data but I am not able to get a single recommendation. I find out that it is due to the fact that the data are too sparse, but I would appreciate the advice of an expert ;)

It is only using purchase history so I have rated a product to a 4.0 for all user id <-> product id purchase.

Here is the data file : http://we.tl/RcR83vcHQI

Could you give me some advice to start having some useful recommendations ?

Thanking you in advance.

1

1 Answers

1
votes

This is a common problem with people new to Mahout. Version 0.9 and before requires your IDs to be sequential contiguous non-negative integers. This includes user and item IDs. They are used in Mahout as the row and column numbers in the matrix of all input.

There are several ways to tackle this like keeping HashBiMaps (Guava collections) for user and item IDs. As you see the first ID assign it a Mahout ID of 0 and store the relationship in the map. Keep looking through your IDs to find the next unique one and assign it Mahout ID = 1, etc.

Then you'll get Mahout IDs back from the recommender. You can use the bidirectional HashBiMap to translate them into your application specific IDs.

BTW Mahout (1.0-snapshot or greater) now has a completely new generation recommender based on using a search engine to serve recommendations and Mahout to calculate the model. It will take the input you have directly - doing the ID translation inside. It has many benefits over the older Hadoop version including:

  1. Multimodal: it can ingest many different user actions on many different item set. This allow you to use much of the user's clickstream to recommend.
  2. Realtime results: it has a very fast scalable server in Solr or Elastic search.
  3. Due to the realtime nature it can recommend to new users or users with very recent history. The older Hadoop Mahout recommenders only recommend to users and items in the training data--they cannot react to history that was not used in training. The new recommender can use realtime gathered data, even on new users.

The new Multimodal Recommender is described here: