1
votes

I'm attempting to write some code for item based collaborative filtering for product recommendations. The input has buyers as rows and products as columns, with a simple 0/1 flag to indicate whether or not a buyer has bought an item. The output is a list similar items for a given purchased, ranked by cosine similarities.

I am attempting to measure the accuracy of a few different implementations, but I am not sure of the best approach. Most of the literature I find mentions using some form of mean square error, but this really seems more applicable when your collaborative filtering algorithm predicts a rating (e.g. 4 out of 5 stars) instead of recommending which items a user will purchase.

One approach I was considering was as follows...

  • split data into training/holdout sets, train on training data
  • For each item (A) in the set, select data from the holdout set where users bought A
  • Determine which percentage of A-buyers bought one of the top 3 recommendations for A-buyers

The above seems kind of arbitrary, but I think it could be useful for comparing two different algorithms when trained on the same data.

2

2 Answers

1
votes

Actually your approach is quiet similar with the literature but I think you should consider to use recall and precision as most of the papers do.

http://en.wikipedia.org/wiki/Precision_and_recall

Moreover if you will use Apache Mahout there is an implementation for recall and precision in this class; GenericRecommenderIRStatsEvaluator

0
votes

Best way to test a recommender is always to manually verify that the results. However some kind of automatic verification is also good.

In the spirit of a recommendation system, you should split your data in time, and see if you algorithm can predict what future buys the user does. this should be done for all users.

Don't expect that it can predict everything, a 100% correctness is usually a sign of over-fitting.