Interpreting the parameters of the evaluate() function of a item-based recommender in Mahout

Question

I am working with boolean values, trying to evaluate a recommending engine in Mahout. My questions are about the selection of the "correct" parameters of the evaluate function. Apologize in advance for the lengthy post.

  IRStatistics evaluate(RecommenderBuilder recommenderBuilder,
                        DataModelBuilder dataModelBuilder,
                        DataModel dataModel,
                        IDRescorer rescorer,
                        int at,
                        double relevanceThreshold,
                        double evaluationPercentage) throws TasteException;

1) Can you think of an example in which the following two parameters must be used:

 - DataModelBuilder dataModelBuilder
 - IDRescorer rescorer

2) For the double relevanceThreshold variable, I set the value GenericRecommenderIRStatsEvaluator.CHOOSE_THRESHOLD, however, I was wondering if a "better" model could be built by setting a different value.

3) In my project, I need to recommend at most 10 items per user. Does this mean that it shouldn't make sense to set a value bigger than 10 for variable int at?

4) Given that I don't bother if I have to wait a lot for building the model, is it a good practice to set variable double evaluationPercentage equal to 1? Can you think of any case where 1 will not give the optimum model?

5) Why precision / recall (note that I am working on boolean data) increases as long as the number of recommendations (i.e. variable int at) increases (I proved that experimentally)?

6) Where does the spiting of both testing and training tests is taking place within mahout, and how could I change that percentage (unless if this is not the case for item-based recommendations)?

Julian Ortega Julian Ortega · Accepted Answer · 2013-02-18T17:50:02

Accurate recommendations alone do not guarantee users of recommender systems an effective and satisfying experience, so measurements should be taken only as a reference point. That said, ideally real users would use your system against a baseline you set (like random recommendations) and do A/B test and see which has better performance. But that can be troublesome and not quite practical.

Precision and recall at N recommendations, are not a great metrics for recommenders. You are better off using a metric like AUC (area under the curve)

Have a look a the Mahout in Action book example (link)
Letting Mahout choose a threshold is fine, but it will be more computationally expensive
Yes, if you are making 10 recommendations, evaluating at 10 makes a lot of sense
Depends on the size of your data really. If using 100% (that is 1.0) is fast enough, I would use that. But if you do use something different (less), I would strongly suggest you use RandomUtils.useTestSeed(); when testing so you know the sampling will be done in the same manner every time you evaluate. (don't use it in production though)
Not sure. Depends on how your data looks like. But normally if precision increases, recall decreases and vice versa. See F1 Score (also available from Mahout IRStatistics)
For IRStatistics I'm not entirely sure where it happens (or if it happens at all). Notice it doesn't even take a % for division into training and test. Although there might be a default somewhere. If I were you I would go through the Mahout code and find out.

Interpreting the parameters of the evaluate() function of a item-based recommender in Mahout

1 Answers