0
votes

I have a Neo4j graph (ver. 2.2.2) with large number of relationships. For examaple: 1 node "Group", 300000 nodes "Data", 300000 relationships from "Group" to all existing nodes "Data". I need to check if there is a relationship between set of Data nodes and specific Group node (for example for 200 nodes). But Cypher query I used is very slow. I tried many modifications of this cypher but with no result.

Cypher to create graph:

FOREACH (r IN range(1,300000) | CREATE (:Data {id:r}));
CREATE (:Group);
MATCH (g:Group),(d:Data) create (g)-[:READ]->(d);

Query 1: COST. 600003 total db hits in 730 ms. Acceptable but I asked only for 1 node.

PROFILE MATCH (d:Data)<-[:READ]-(g:Group) WHERE id(d) IN [10000] AND id(g)=300000 RETURN id(d);

Query 2: COST. 600003 total db hits in 25793 ms. Not acceptable.

You need to replace "..." with real numbers of nodes from 10000 to 10199

PROFILE MATCH (d:Data)<-[:READ]-(g:Group) WHERE id(d) IN [10000,10001,10002 " ..." ,10198,10199] AND id(g)=300000 RETURN id(d);

Query 3: COST. 1000 total db hits in 309 ms. This is only one solution I found to make query acceptable. I returned all ids of nodes "Group" and manualy filter result in my code to return only relationships to node with id 300000

You need to replace "..." with real numbers of nodes from 10000 to 10199

PROFILE MATCH (d:Data)<-[:READ]-(g:Group) WHERE id(d) IN [10000,10001,10002 " ..." ,10198,10199] RETURN id(d), id(g);

Question 1: Total DB hits in query 1 is surprising but I accept that physical model of neoj defines how this query is executed - it needs to look into every existing relation from node "Group". I accept that. But why is so big difference in execution time between query 1 and query 2 if number of db hits is the same (and exucution plan is the same)? I'm only returning id of node, not large set of properties.

Question 2: Is a query 3 the only one solution to optimize this query?

1

1 Answers

1
votes

Apparently there is an issue with Cypher in 2.2.x with the seekById.

You can prefix your query with PLANNER RULE in order to make use of the previous Cypher planner, but you'll have to split your pattern in two for making it really fast, tested e.g. :

PLANNER RULE
MATCH (d:Data) WHERE id(d) IN [30]
MATCH (g:Group) WHERE id(g) = 300992
MATCH (d)<-[:READ]-(g)
RETURN id(d)