I have two kinds of nodes in my database:
- USER
- MEDIA
And one relationship - "LIKES"
The relationship between the two nodes is described like so:
(:USER)-[:LIKES]->(:MEDIA)
I'm trying to compute the similarity between all the "USER" nodes based on the number of media shared between each node pair (Jaccard Similarity)
This similarity is then stored as a "ISSIMILAR" relationship. The "ISSIMILAR" relationship has an attribute called "similarity" which stores the similarity between nodes
Here's my query:
Match(u:User)
WITH COLLECT(u) as users
UNWIND users as user
MATCH(user:User{id:user.id})-[:LIKES]->(common_media:Media)<-[:LIKES]-(other:User)
WITH user,other,count(common_media) AS intersection, COLLECT(common_media.name) as i
MATCH(user)-[:LIKES]->(user_media:Media)
WITH user,other,intersection,i, COLLECT(user_media.name) AS s1
MATCH(other)-[:LIKES]->(other_media:Media)
WITH user,other,intersection,i,s1, COLLECT(other_media.name) AS s2
WITH user,other,intersection,s1,s2
WITH user,other,intersection,s1+filter(x IN s2 WHERE NOT x IN s1) AS union, s1,s2
WITH ((1.0*intersection)/SIZE(union)) as jaccard,user,other
MERGE(user)-[:ISSIMILAR{similarity:jaccard}]-(other)
Running this query, I have two issues:
- I expect only one "ISSIMILAR" relationship between a pair of nodes. But it creates two.
- This "ISSIMILAR" relationship "similar" attributes have different values.The values should be the same
Here's a visualization of the issue:
MATCH(user:User)-[r]-(o:User) return o,user,r limit 4
Thanks in advance