Setup:
- Neo4j - 1.9.3
- ~7,000 nodes
- ~1.8 million relationships
I have the following cypher query that I would like to improve the performance on:
START a=node(2) MATCH (a)-[:knowledge]-(x)-[:depends]-(y)-[:knowledge]-(end) RETURN COUNT(DISTINCT end);
This returns 471 (188171 ms).
Right now I'm only getting a count but later I may want to get the values (471 in this example). The problem is it takes about 3-4 minutes to run.
The graph is highly connected with many relationships. Running the following shows how many edges of type "knowledge" exist for node a(2).
START a=node(2) MATCH (a)-[:knowledge]-(x) RETURN COUNT(a);
This returns 4350 (103 ms).
To me, this doesn't seem like many edges to check. Can I split this up somehow to improve performance?
edit: As per the comments, here are the results from running the query with profile:
profile START a=node(2) MATCH (a)-[:knowledge]-(x)-[:depends]-(y)-[:knowledge]-(end) RETURN COUNT(DISTINCT end);
==> +---------------------+
==> | COUNT(DISTINCT end) |
==> +---------------------+
==> | 471 |
==> +---------------------+
==> 1 row
==>
==> ColumnFilter(symKeys=[" INTERNAL_AGGREGATEcd2aff18-1c9d-47a8-9217-588cb86bbc1a"], returnItemNames=["COUNT(DISTINCT end)"], _rows=1, _db_hits=0)
==> EagerAggregation(keys=[], aggregates=["( INTERNAL_AGGREGATEcd2aff18-1c9d-47a8-9217-588cb86bbc1a,Distinct)"], _rows=1, _db_hits=0)
==> TraversalMatcher(trail="(a)-[ UNNAMED7:knowledge WHERE true AND true]-(x)-[ UNNAMED8:depends WHERE true AND true]-(y)-[ UNNAMED9:knowledge WHERE true AND true]-(end)", _rows=25638262, _db_hits=25679365)
==> ParameterPipe(_rows=1, _db_hits=0)
(x)-[:depends]-(y)
and(y)-[:knowledge]-(end)
match just as many you have 82312875000 paths. In a graph with many relationships to nodes, what Wes says about specifying direction could be particularly helpful in ruling out irrelevant paths. You may want to run the query in webadmin withPROFILE
prepended and look at (post here) the execution plan, which would show how many matches (rows) are maintained at each step of the query. Also, try returning withoutDISTINCT
to see how manyend
were actually matched. – jjaderberg