5
votes

Is it possible to configure Mahout to retrieve input data from a Cassandra cluster while executing a Recommender Job over Hadoop?

I have found some resources on this topic - see http://www.acunu.com/2/post/2011/08/scaling-up-cassandra-and-mahout-with-hadoop.html, but the indications described there do not seem to work (tried both on mahout-0.6 and mahout-0.7). For istance itemIDIndexPath variable does not seem to exist in the RecommenderJob class nor in the abstract classes it extends.

1

1 Answers

0
votes

I've tried running Pig/Hive queries against Cassandra and found it to be rather unstable under load. The problem is that Cassandra's read path is rather inefficient, espcially over Thrift. I would recommend dumping the data to HDFS as an intermediate step and process it from there