2
votes

The query I've written returns accurate results based on some random testing I've done. However, the query execution takes really long (7699.43 s) I need help optimising this query.

count(Person) -> 67895

count(has_POA) -> 355479

count(POADocument) -> 40

count(issued_by) -> 40

count(Company) -> 21

count(PostCode) -> 9845

count(Town) -> 1673

count(in_town) -> 9845

count(offers_services_in) -> 17107

All the entity nodes are indexed on Id's (not Neo4j IDs). The PostCode nodes are also indexed on PostCode.

enter image description here enter image description here

MATCH pa= (p:Person)-[r:has_POA]->(d:POADocument)-[:issued_by]->(c:Company),
(pc:PostCode),(t:Town) WHERE r.recipient_postcode=pc.PostCode AND (pc)-
[:in_town]->(t) AND NOT (c)-[:offers_services_in]->(t) RETURN p as Person,r 
as hasPOA,t as Town, d as POA,c as Company

Much thanks in advance! -Nancy

2

2 Answers

3
votes

I made some changes in your query:

MATCH (p:Person)-[r:has_POA {recipient_code : {code} }]->(d:POADocument)-[:issued_by]->(c:Company),
    (pc:PostCode {PostCode : {PostCode} })-[:in_town]->(t:Town)
WHERE NOT (c)-[:offers_services_in]->(t)
    RETURN p as Person, r as hasPOA, t as Town, d as POA, c as Company
  1. Since you are not using the entire path, removed pa variable
  2. Moved the pattern existence check ((pc)-[:in_town]->(t)) from WHERE to MATCH.
  3. Using parameters instead of the equality check r.recipient_postcode = pc.PostCode in where. If you are running the query in Neo4j Browser, you can set the parameters running the command :params {code : 10}.
1
votes

Here is a simplified version of your current query.

MATCH (p:Person)-[r:has_POA]->(d:POADocument)-[:issued_by]->(c:Company)
MATCH (t:Town)<-[:in_town]-(pc:PostCode{PostCode:r.recipient_postcode})
WHERE NOT (c)-[:offers_services_in]->(t)
RETURN p as Person,r as hasPOA,t as Town, d as POA,c as Company

Your big performance hits are going to be on the Cartesian product between all the match sets, and the raw amount of data you are asking for.

In this simplified version, I'm using one less match, and the second match uses a variable from the first match to avoid generating a Cartesian product. I would also recommend using LIMIT and SKIP to page your results to limit data transfer.

If you can adjust your model, I would recommend converting the has_POA relation to an issued_POA node so that you can take advantage of Neo4j's relation finding on the 2 postcodes related to that instance, and making the second match a gimme instead of an extra indexed search (after you adjust the query to match the new model, of course).