1
votes

I am trying to calculate precision and recall at n of a Data set with Boolean Preferences using item item Recommender given in mahout.

I am using GenericBooleanPrefItemBasedRecommender and

evaluate(RecommenderBuilder recommenderBuilder,DataModelBuilder dataModelBuilder, DataModel dataModel,IDRescorer rescorer,int at,double relevanceThreshold,double evaluationPercentage) throws TasteException; `

Since there are Boolean preferences, the set of "relevant" or "good" movies for a user are all the ones rated 1.

If I run the same code many times it always gives the same value of precision and recall and they are always equal to each other. Why? I am NOT using RandomUtils.useTestSeed() How does it split the data into training and test set?

Possibilities:
a)Randomly divides the total data set into test and training at the beginning OR for each user it randomly puts a fixed percentage of relevant movies into test set: :How does it decide this percentage since there is no place for user to input this as a parameter.Why do I get the same value of P and R each time I run the code and why is the value of P at n and R at n the same?
b)For each user, it puts all relevant movies in the training set: Then there is no information left on user to make any recommendations and thus its not possible.

Since I am getting that value of P and R at n are equal, does that mean that for each user, the number of relevant movies are moved to the test set each time = number of recommendations i.e. n. If the n relevant movies put in the test set are random then why do I get same value of P and R each time I run the code.

The only explanation that I can think of that explains the results is that the recommender calculates P and R at n as follows: One by one, for each user it randomly puts 'n' relevant movies in test set. The process has to be random since it can't distinguish between all relevant movies but the process is fixed and each time the code is run it picks the same n relevant movies for each user. It then makes n recommendations and calculates P and R at n.

While this explains the results I don't think it is a good process because:
1)The concept of training and test set is not defined as a percentage and thus not consistent with the usual definition.
2) P and R will always be equal to each other so we only get one metric as opposed to two.
3) The process of picking 'n' movies randomly is the same each time.

EDIT: I AM ADDING MY FULL CODE IN CASE IT HELPS ANSWER MY QUESTION:

public static void main (String[] args) throws Exception {

FileDataModel model = new FileDataModel(new File("data/test.csv"));
RecommenderIRStatsEvaluator evaluator = new GenericRecommenderIRStatsEvaluator();
RecommenderBuilder recommenderBuilder = new RecommenderBuilder() {
@Override
public Recommender buildRecommender(DataModel model) {
ItemSimilarity similarity = new LogLikelihoodSimilarity(model);
return new GenericBooleanPrefItemBasedRecommender(model, similarity);
}
};

IRStatistics stats = evaluator.evaluate(
recommenderBuilder, null, model, null, 5,
GenericRecommenderIRStatsEvaluator.CHOOSE_THRESHOLD,1.0);

System.out.println(stats.getPrecision());
System.out.println(stats.getRecall());
}
2
Welcome to Stackoverflow. Please have a look at the formatting help and try out for example code formatting and simple lists. Another problem with your post is, that it appears to be too broad and includes more than one question and/or the main question is not clear. Please be more specific.user1251007

2 Answers

0
votes

Don't know for sure but if you seed a random number generator with the same value each time you use it, the sequence of numbers it returns will be identical. Check to see if there is a way to seed the rng with something like the system time. Just a guess.

0
votes

Check out my answer on related question: How mahout's recommendation evaluator works

I think this will help you understand how the evaluation works, how the relevant items are chosen, and how Precision and Recall are computed.