This is a common problem with people new to Mahout. Version 0.9 and before requires your IDs to be sequential contiguous non-negative integers. This includes user and item IDs. They are used in Mahout as the row and column numbers in the matrix of all input.
There are several ways to tackle this like keeping HashBiMaps (Guava collections) for user and item IDs. As you see the first ID assign it a Mahout ID of 0 and store the relationship in the map. Keep looking through your IDs to find the next unique one and assign it Mahout ID = 1, etc.
Then you'll get Mahout IDs back from the recommender. You can use the bidirectional HashBiMap to translate them into your application specific IDs.
BTW Mahout (1.0-snapshot or greater) now has a completely new generation recommender based on using a search engine to serve recommendations and Mahout to calculate the model. It will take the input you have directly - doing the ID translation inside. It has many benefits over the older Hadoop version including:
- Multimodal: it can ingest many different user actions on many different item set. This allow you to use much of the user's clickstream to recommend.
- Realtime results: it has a very fast scalable server in Solr or Elastic search.
- Due to the realtime nature it can recommend to new users or users with very recent history. The older Hadoop Mahout recommenders only recommend to users and items in the training data--they cannot react to history that was not used in training. The new recommender can use realtime gathered data, even on new users.
The new Multimodal Recommender is described here: