I'm new to Mahout-Samsara and I'm trying to understand the "domain" of the different projects and how they relate to each other. I understand that Apache Mahout-Samsara deprecates many MapReduce algorithms, and that things will be based on Apache Flink or Spark or other engines like h2o ( based on the introduction of the "Apache Mahout: Beyond MapReduce" book).
I want to try some recommender algorithms but I'm not so sure about what's new and what's 'deprecated'. I see the following links,
referring to spark-rowsimilarity
and spark-itemsimilarity
. (I don't understand if these links are talking about an off-the-self algorithm or a design... it's probably a design because they are not listed at mahout dot apachedot org/users/basics/algorithms.html ... anyways...).
And at the same time, Apache Flink (or is it Spark MLLib?) implements the ALS algorithm for recommendation (Machine Learning for Flink and Spark MLlib).
General questions:
Is it that these algorithms from mahout.apache.org are deprecated and they are being migrated to Flink / Spark MLLib, so that the ML library and support at Flink / Spark MLLib will grow?
Is Flink / Spark MLLib intended to be more an engine or engine + algorithm library with good support for the algorithms?
Other links to help the conversation:
Specific question:
I want to try a recommender algorithm as a 'gray box' (part 'black box' because I don't want to get too deep into the math, part 'white box' because I want to tweak the model and the math to the extent that I need to improve results).
I'm not interested in other ML algorithms yet. I thought about starting with what's off-the-shelf and then changing the ALS implementation of MLLib. Would that be a good approach? Any other suggestions?