1
votes

I have two kinds of nodes in my database:

  1. USER
  2. MEDIA

And one relationship - "LIKES"

The relationship between the two nodes is described like so:

(:USER)-[:LIKES]->(:MEDIA)

I'm trying to compute the similarity between all the "USER" nodes based on the number of media shared between each node pair (Jaccard Similarity)

This similarity is then stored as a "ISSIMILAR" relationship. The "ISSIMILAR" relationship has an attribute called "similarity" which stores the similarity between nodes

Here's my query:

Match(u:User)

WITH COLLECT(u) as users

UNWIND users as user

MATCH(user:User{id:user.id})-[:LIKES]->(common_media:Media)<-[:LIKES]-(other:User)

WITH user,other,count(common_media) AS intersection, COLLECT(common_media.name) as i

MATCH(user)-[:LIKES]->(user_media:Media)

WITH user,other,intersection,i, COLLECT(user_media.name) AS s1

MATCH(other)-[:LIKES]->(other_media:Media)

WITH user,other,intersection,i,s1, COLLECT(other_media.name) AS s2

WITH user,other,intersection,s1,s2

WITH user,other,intersection,s1+filter(x IN s2 WHERE NOT x IN s1) AS union, s1,s2

WITH ((1.0*intersection)/SIZE(union)) as jaccard,user,other

MERGE(user)-[:ISSIMILAR{similarity:jaccard}]-(other)

Running this query, I have two issues:

  1. I expect only one "ISSIMILAR" relationship between a pair of nodes. But it creates two.
  2. This "ISSIMILAR" relationship "similar" attributes have different values.The values should be the same

Here's a visualization of the issue:

MATCH(user:User)-[r]-(o:User) return o,user,r limit 4

enter image description here

enter image description here

Thanks in advance

1

1 Answers

2
votes

Problems with two similarity relationships arise because you do not exclude the previously constructed similarity relations. You can avoid this by doing:

...
UNWIND users as user
  UNWIND users as other 
    WITH user, other WHERE ID(user) > ID(other)
    MATCH(user)-[:LIKES]->(common_media:Media)<-[:LIKES]-(other) 
...

And the final query can be made more clear:

MATCH (u:User) WITH COLLECT(u) AS users
UNWIND users AS user
UNWIND users AS other

MATCH (user)-[:LIKES]->(common_media:Media)<-[:LIKES]-(other) WHERE ID(other) > ID(user)
WITH user, other, COLLECT(common_media) AS intersection

MATCH (user)-[:LIKES]->(user_media:Media)
WITH user, other, intersection, 
     COLLECT(user_media) AS s1

MATCH (other)-[:LIKES]->(other_media:Media)
WITH user,other,intersection, s1, 
     COLLECT(other_media) AS s2

RETURN user, other,
       (1.0 * SIZE(intersection)) / (SIZE(s1) + SIZE(s2) - SIZE(intersection)) AS jaccard

MERGE (user)-[:ISSIMILAR {similarity: jaccard}]->(other)