1
votes

I'm currently working on a movie recommendation query that should return the movies with the most "recommendation impact" using the following cypher query:

match (m:Movie) 
with m, size((m)<-[:LIKED]-(:User)-[:LIKED]->(:Movie)) as score
order by score desc
limit 10
return m.title, score

After reading the graphdb (neo4j) e-book my assumption was that this whould be an easy query for neo4j but the execution time took 32737 ms which is not what I was expecting. Does any one have experience with these kind of queries and has any suggestions to improve performance? Or should this query perform well and do I need to do some neo4j / java configuration tuning?

The profile of the query:

enter image description here

The result:

enter image description here

2

2 Answers

1
votes

Maybe this is something you can pre-calculate.

Your score is related to the number of movies liked by each user. Why not calculate and store the number of movies liked by each user (assuming a user can only like a movie once, not multiple times)?

Note that this only makes sense if you only care about the number of movies liked by each user, and are okay with adding those up, even if they represent multiple likes of the same movie across many users.

MATCH (u:User)
SET u.likedCount = SIZE((u)-[:LIKED]->(:Movie))

You will need to update this every time the user likes (or unlikes) another movie.

When this is pre-populated for all users, your scoring query now becomes:

MATCH (m:Movie)
WITH m
MATCH (m)<-[:LIKED]-(u:User)
WITH m, SUM(u.likedCount) as score
ORDER BY score desc
LIMIT 10
RETURN m.title, score

EDIT

This of course includes the likes from each user of the movie in question. If you really need to account for this, you'll need to adjust your with to:

WITH m, SUM(u.likedCount) - count(u) as score

If you only want to count distinct movies liked by users in your scoring, then you can't pre-calculate and have to use something like stdob--'s answer.

1
votes

Try this query:

MATCH (M:Movie)<-[:LIKED]-(:User)-[:LIKED]->(R:Movie)
WITH M, 
     size( collect(distinct R) ) as score
RETURN M.title as title, 
       score 
ORDER BY score DESC LIMIT 10

As an option:

MATCH (M:Movie)<-[:LIKED]-(:User)-[:LIKED]->(R:Movie)
RETURN M.title as title, 
       count(R) as score
ORDER BY score DESC LIMIT 10