0
votes

I have number of User nodes and Skills nodes. The relationship is between skills and users.

USING PERIODIC COMMIT 1000
LOAD CSV WITH HEADERS FROM "file:/xyz.csv" AS row FIELDTERMINATOR '|'
WITH row
LIMIT 15

CREATE (u:User {company: row.company, salary: row.salary_float,         designation: row.designation, experience: row.experience_float})

FOREACH (s IN split(row.tag_skill, "@") |
MERGE (skill:SKILL {name: s})
ON CREATE SET skill.name = s
CREATE (u)-[:KNOWS]->(skill))

I also need a relationship between user nodes where if User A is connected to a number for Skill nodes [s1,s2,s3,s4,s5,s6] and if User B is connected to [s1,s3,s4,s6]

User A and User B are in relationship(similar) if atleast 50% of their skills match.

in this example, A is in a relationship with B since they have s1,s3,s4,s6 in common which is more than 50% match

Cant seem to figure out this cypher query.

1
Check this question: stackoverflow.com/questions/40454537/… - it's quite similar to your problem.Gabor Szarnyas

1 Answers

2
votes

Here is an adaptation of my answer for a similar question:

MATCH (u1:User)-[:KNOWS]->(:Skill)<-[:KNOWS]-(u2:User) // (1)
MATCH
  (u1)-[:KNOWS]->(s1:Skill),
  (u2)-[:KNOWS]->(s2:Skill) // (2)
WITH
  u1, u2, 
  COUNT(DISTINCT s1) AS s1Count, COUNT(DISTINCT s2) AS s2Count // (3)
MATCH (u1)-[:KNOWS]->(s:Skill)<-[:KNOWS]-(u2) // (4)
WITH
  u1, u2,
  s1Count, s2Count, COUNT(s) AS commonSkillsCount // (5)
WHERE
  // we only need each u1-u2 pair once
  ID(u1) < ID(u2) AND // (6)
  // similarity
  commonSkillsCount / 0.5 >= s1Count AND
  commonSkillsCount / 0.5 >= s2Count // (7)
RETURN u1, u2
ORDER BY u1.name, u2.name

We look for u1, u2 Users that have at least a single common Skill (1). We then gather their individual skills separately (2) and count them (3), and also gather their mutual Skills (4) and count them (5). We then remove one of the (u1, u2) and (u2, u1) pairs (e.g. from (Alice, Bob) and (Bob, Alice) we only keep the former (6) and check if the number of their common skills is above the threshold (7). (Floating point arithmetics is sometimes tricky in Cypher - IIRC if we move the / 0.5 to the right side of the inequality as sXCount * 0.5, we'd have to use the toFloat() function).