I am working with boolean values, trying to evaluate a recommending engine in Mahout. My questions are about the selection of the "correct" parameters of the evaluate function. Apologize in advance for the lengthy post.
IRStatistics evaluate(RecommenderBuilder recommenderBuilder,
DataModelBuilder dataModelBuilder,
DataModel dataModel,
IDRescorer rescorer,
int at,
double relevanceThreshold,
double evaluationPercentage) throws TasteException;
1) Can you think of an example in which the following two parameters must be used:
- DataModelBuilder dataModelBuilder
- IDRescorer rescorer
2) For the double relevanceThreshold
variable, I set the value GenericRecommenderIRStatsEvaluator.CHOOSE_THRESHOLD, however, I was wondering if a "better" model could be built by setting a different value.
3) In my project, I need to recommend at most 10 items per user. Does this mean that it shouldn't make sense to set a value bigger than 10 for variable int at
?
4) Given that I don't bother if I have to wait a lot for building the model, is it a good practice to set variable double evaluationPercentage
equal to 1? Can you think of any case where 1 will not give the optimum model?
5) Why precision / recall (note that I am working on boolean data) increases as long as the number of recommendations (i.e. variable int at
) increases (I proved that experimentally)?
6) Where does the spiting of both testing and training tests
is taking place within mahout, and how could I change that percentage (unless if this is not the case for item-based recommendations)?