I have two kinds of nodes in my database: 1) User 2) Media 3) Tag
I also have a relationship with all Media nodes like such:
(:Media)-[:IS_SIMILAR]-(:Media)
And another relationship (:Media)-[:HAS_TAG]-(:Tag)
And another relationship (:User)-[:LIKES]-(:Media)
Here's a visualization:
The green nodes are media and blue is a user (i excluded the tag nodes)
This IS_SIMILAR relationship has an attribute similarity. This attribute similarity is computed by calculating the number of tags each node pair has in common.
I am trying to perform content-based filtering by finding the media a user likes and getting top 10 media based on the similarity attribute.
I construct the following query:
Match(u:User{id:"Dorian"})-[:LIKES]-(m:Media)
WITH collect(m) as mu
UNWIND mu as m
Match(m)-[s:ISSIMILAR]-(o:Media)
WHERE NOT o in mu
RETURN DISTINCT o,s ORDER BY s.similarity DESC
With the following results:
Unfortunately, there are repeated Media nodes because each Media node that is liked by a user also has an IS_SIMILAR relationship with other media nodes.
Can you suggest:
1) how I can avoid this problem
2) another method to perform content-based recommendation with my schema?