2
votes

I have been struggling to understand how to implement user/item based recommendation using Mahout-Samsara, but not able to understand how to use it. I have very basic knowledge of Mahout Map Reduce based algorithms but now Mahout declared RIP to map-reduce.

So far i got to know ..

  1. Mahout-Samsara is a new code name which represents all Mahout 0.10+ releases. Mahout has abandoned MapReduce based algorithm and moved to Scala-based programming environment. Now Mahout supports different distribution engines like spark, H2O and Flink.

  2. The new Mahout (Samsara) is a Scala based solution which has R-like Scala DSL (Domain Specific Language) layer on top.

  3. We can play with Mahout spark shell by following the below document. http://mahout.apache.org/users/sparkbindings/play-with-shell.html

What i am looking for ..

I am looking a kind of spark-rowsimilarity, spark-itemsimilarity example on Mahout (not the command line jobs). I was checking this tutorial as well but its more concentrating on command line.

Can someone please provide some example on how to implement user/item based recommendation engine on new Mahout ? What exactly the input DataModel ? Is the same as the previous Mahout ? Can a File System Data Model still being used in New Mahout spark based algorithms as well ?

1

1 Answers

1
votes

Mahout Samsara is a compute engine for ML. It needs a Server to actually answer queries and ingest data.

The final step that Mahout does not take, is to put the generated model into a k-nearest neighbors engine for the query phase. Lucene, and it's scalable server versions in Solr and Elasticsearch are perfect for KNN. The query will be the most recent part of the user's history. I've implemented this in Ruby with MongoDB integration buy writing the model into MongoDB and indexing it with Solr. Then making Solr queries to get recommendations. This was a fair bit of work so...

The guy who maintains CCO in Mahout (me) also created an Apache PredictionIO Template that is extremely full featured and includes all the components you need in a nearly turnkey solution. It is OSS so you can either use it as an example or just install and run it.