1
votes

I have a movie database with users rating movies. I want to find the top 5 most similar users to user 1 (first MATCH which works fine) and recommend him the top rated movies watched by those similar users but not watched by user 1. I get the same movie multiple times even though I have "distinct" in my query. What am I doing wrong?

MATCH (target_user:User {id : 1})-[:RATED]->(m:Movie)
      <-[:RATED]-(other_user:User)
WITH other_user, count(distinct m.title) AS num_common_movies, target_user
ORDER BY num_common_movies DESC
LIMIT 5
MATCH other_user-[rat_other_user:RATED]->(m2:Movie)
WHERE NOT (target_user-[:RATED]->m2)
WITH distinct m2.title as movietitle, rat_other_user.note AS rating,
       other_user.id AS watched_by
RETURN movietitle, rating, watched_by
ORDER BY rating DESC
1

1 Answers

2
votes

You dataset probably has many users who have watched and rated the same movies. When you execute that DISTINCT statement it is going to return a distinct row, not a distinct movie title. Different users will have rated the unwatched movies differently and have different names.

You will have to tune this for your particular use case but you can start from:

MATCH (target_user:User { uid : 1 })-[:RATED]->(m:Movie)<-[:RATED]-(other_user:User)
WITH other_user, count(DISTINCT m.title) AS num_common_movies, target_user
ORDER BY num_common_movies DESC 
LIMIT 5
MATCH other_user-[rat_other_user:RATED]->(m2:Movie)
WHERE NOT (target_user-[:RATED]->m2)
RETURN DISTINCT m2.name AS movietitle, COLLECT(rat_other_user.note) AS ratings, 
MAX(rat_other_user.note) AS maxi, AVG(rat_other_user.note) as aver, COLLECT(other_user.name) AS users
ORDER BY aver DESC

I added a console demo here.

Importantly the you are now aggregating your results per movie title.