If I am using SIMILARITY_LOGLIKELIHOOD (LLR) are item ratings really ignored?

Question

I used the movie lens data file (ml-100k.zip) u.data unchanged, so it had the columns: userID, MovieID and user rating.

I used LLR:

hadoop jar C:\hdp\mahout-0.9.0.2.1.3.0-1981\core\target\mahout-core-0.9.0.2.1.3.0-1981-job.jar org.apache.mahout.cf.taste.hadoop.item.RecommenderJob -s SIMILARITY_LOGLIKELIHOOD --input u.data --output udata_output

When I look at the udata_output file I see recommended movie ID's followed by recommendation scores like:

1226:5.0 and 896:4.798878

The recommendation scores seemed to vary from 5.0 to 4.x

However, when I deleted the user rating column from the u.data file and re-ran the same command line above I received results like:

615:1.0

where ALL recommendation scores were 1.0.

2 questions:

1) If LLR ignores the user ratings and the only input I change is the whether to provide the user rating why do the recommendation scores change?

2) Overall, I am trying to determine recommendation ranking so I'm using LLR. In addition should I ignore the recommendation scores and only focus on the order of the items recommended (e.g.: the first item is ranked higher than the 2nd)?

Thanks in advance.

pferrel pferrel · Accepted Answer · 2015-02-07T16:24:45

LLR does not use the strengths. The theory is that if a user actually interacted with an item, that is all the indication needed. LLR will correlate that interaction with other user's and score based on a probabilistic calculation called the Log Likelihood Ratio. It does create strengths but only uses the counts of interactions.

Answers

This could be a bug or could be because you are using a boolean recommender in one case and an non-boolean in the other. I could be that the recommender is trying to provide ratings by taking account of the values. But none of this really matters if you are trying to optimize ranking
You really never need to look at the recommendation weights unless you are trying to predict ratings, which seldom happens these days. Trust the ranking of recs.

BTW Mahout now has a completely new generation recommender based on using a search engine to serve recommendations and Mahout to calculate the model. It has many benefits over the older Hadoop version including:

Multimodal: it can ingest many different user actions on many different item set. This allow you to use much of the user's clickstream to recommend.
Realtime results: it has a very fast scalable server in Solr or Elastic search.
Due to the realtime nature it can recommend to new users or users with very recent history. The older Hadoop Mahout recommenders only recommend to users and items in the training data--they cannot react to history that was not used in training. The new recommender can use realtime gathered data, even on new users.

The new Multimodal Recommender in Mahout 1.0-snapshot or greater is described here:

Mahout site
A free ebook, which talks about the general idea: Practical Machine Learning
A slide deck, which talks about mixing actions or other indicators: Creating a Unified Multimodal Recommender
Two blog posts: What's New in Recommenders: part #1 and What's New in Recommenders: part #2
A post describing the log likelihood ratio: Surprise and Coincidence LLR is used to reduce noise in the data while keeping the calculations O(n) complexity.

If I am using SIMILARITY_LOGLIKELIHOOD (LLR) are item ratings really ignored?

1 Answers