1
votes

I just recently started with graph databases and Neo4j. I have created nodes with the following labels: Person, Company, City, Skill which have one or more relationships in between them.

I am trying to do a cypher query which finds one Person node (by node id) and then I want to find all the skills, Companies etc related to this Person. This is my query so far:

MATCH (person:Person)
WHERE ID (person) = 123
WITH person LIMIT 1
OPTIONAL MATCH (p:Person) - [:HAS_SKILL] -> (skill:Skill)
WHERE ID (p) = 123
WITH person, skill ORDER BY skill.name ASC
OPTIONAL MATCH (p:Person) - [:WORKED_AT] -> (company:Company)
WHERE ID (p) = 123
WITH person, skill, company ORDER BY company.name ASC
OPTIONAL MATCH (p:Person) - [:LIVES_IN] -> (city:City)
WHERE ID (p) = 123
WITH person
, collect(DISTINCT skill) as skills
, collect(DISTINCT company) as companies
, city LIMIT 1
RETURN person, skills, companies, city

I'm concerned that this is not really an optimal query and it looks like it will be really slow if/when we get lets say a couple of million nodes. Can someone tell me a better way of structuring this kind of query? Should I divide it up to several smaller queries?

1

1 Answers

2
votes

I think the first thing is that you do not need to keep querying the person object, although I am not sure that this will impact performance as the person variable is already bound. That reduces your query to:

MATCH (person:Person)
WHERE ID (person) = 123
WITH person
OPTIONAL MATCH (person) - [:HAS_SKILL] -> (skill:Skill)
WITH person, skill ORDER BY skill.name ASC
OPTIONAL MATCH (person) - [:WORKED_AT] -> (company:Company)
WITH person, skill, company ORDER BY company.name ASC
OPTIONAL MATCH (person) - [:LIVES_IN] -> (city:City)
WITH person
, collect(DISTINCT skill) as skills
, collect(DISTINCT company) as companies
, city LIMIT 1
RETURN person, skills, companies, city

You also do not need to continue to use the WITH as you can have multiple MATCH statements, but now I have removed your order statements:

MATCH (person:Person)
WHERE ID (person) = 123
OPTIONAL MATCH (person) - [:HAS_SKILL] -> (skill:Skill)
OPTIONAL MATCH (person) - [:WORKED_AT] -> (company:Company)
OPTIONAL MATCH (person) - [:LIVES_IN] -> (city:City)
RETURN  person
, collect(DISTINCT skill) as skills
, collect(DISTINCT company) as companies
, city LIMIT 1

I believe you can then order more like this, but you're going to end up with an odd sort order!:

MATCH (person:Person)
WHERE ID (person) = 123
OPTIONAL MATCH (person) - [:HAS_SKILL] -> (skill:Skill)
OPTIONAL MATCH (person) - [:WORKED_AT] -> (company:Company)
OPTIONAL MATCH (person) - [:LIVES_IN] -> (city:City)
WITH DISTINCT person, skill, company, city
ORDER BY skill.name, company.name
RETURN  person
, collect(DISTINCT skill) as skills
, collect(DISTINCT company) as companies
, city LIMIT 1

Whether that is any more performant, you'll have to test. If you wanted to impose limits or aggregates though, you would need to use your WITH statements as you originally were.

Just remembered this blog post on optimisation by Mark Needham, not directly relevant but contains some rules that are worth referring back to:

  1. Use parameters whenever you can.
  2. Avoid cartesian products when they aren’t required to get the data you need.
  3. Avoid patterns in the WHERE clause
  4. Start your MATCH patterns at the lowest cardinality identifier you can (ideally 1), and expand outward.
  5. Separate your MATCH patterns, doing the minimal amount of expansion for each pattern. Add 1 new identifier per pattern.
  6. Call for Wes!