4
votes

I am trying to fully understand the item-to-item Amazon's algorithm to apply it to my system to recommend items the user might like, matching the previous items the user liked.

So far I have read these: Amazon paper, item-to-item presentation and item-based algorithms. Also I found this question, but after that I just got more confused.

What I can tell is that I need to follow the next steps to get the list of recommended items:

  1. Have my data set with the items that liked to the users (I have set liked=1 and not liked=0).
  2. Use Pearson Correlation Score (How is this done? I found the formula, but is there any example?).
  3. Then what should I do?

So I came with this questions:

  1. What are the differences between the item-to-item and item-based filtering? Are both algorithms the same?
  2. Is it right to replace the ranked score with liked or not?
  3. Is it right to use the item-to-item algorithm, or is there any other more suitable for my case?

Any information about this topic will be appreciated.

1

1 Answers

4
votes

Great questions.

Think about your data. You might have unary (consumed or null), binary (liked and not liked), ternary (liked, not liked, unknown/null), or continuous (null and some numeric scale), or even ordinal (null and some ordinal scale). Different algorithms work better with different data types.

Item-item collaborative filtering (also called item-based) works best with numeric or ordinal scales. If you just have unary, binary, or ternary data, you might be better off with data mining algorithms like association rule mining.

Given a matrix of users and their ratings of items, you can calculate the similarity of every item to every other item. Matrix manipulation and calculation is built into many libraries: try out scipy and numpy in Python, for example. You can just iterate over items and use the built-in matrix calculations to do much of the work in https://en.wikipedia.org/wiki/Cosine_similarity. Or download a framework like Mahout or Lenskit, which does this for you.

Now that you have a matrix of every item's similarity to every other item, you might want to suggest items for User U. So look in her history of items. For each history item I, for each item in your dataset ID, add the similarity of I to ID to a list of candidate item scores. When you've gone through all history items, sort the list of candidate items by score descending, and recommend the top ones.

To answer the remaining questions: a continuous or ordinal scale will give you the best collaborative filtering results. Don't use a "liked" versus "unliked" scale if you have better data.

Matrix factorization algorithms perform well, and if you don't have many users and you don't have lots of updates to your rating matrix, you can also use user-user collaborative filtering. Try item-item first through: it's a good all-purpose recommender algorithm.