0
votes

I am using Neo4j Community 4.0.4.
I have encountered this issue using the offical Bolt driver for Python, but it is also completely reproducible in the Neo4j browser (version 4.0.7).

I have a very simple graph for now, consisting of the following node and relationship types:

(:Document)-[:contains]->(:Block)
(:Block)<-[:prev]-(:Block)-[:next]->(:Block)

There are only 75 nodes in my entire test database for now - 1 Document node and 74 Block nodes.

Running the following Cypher statement brings the CPU to 100% and the memory utilization rises indefinitely, after which I have to kill the session:

match (d:Doc{name: 'doc name'}) 
optional match (d)-[*]-(n) 
return d,n

I also got the Java heap size error at some point. It only starts to work if I set a strict upper bound on the relationship or specify the direction, e.g.:

optional match (d)-[*..5]->(n)

For example, this already does not work (the answer takes forever so I have to kill the session):

optional match (d)-[*..5]-(n)

Considering that (a) I am doing a strictly local graph traversal that graph databases are supposed to be exceptionally good at, (b) clusters associated with different starting nodes are NOT connected and (c) my test data set is tiny, how can this be happening?

From the symptoms it appears that the engine simply does not keep track of which nodes and relationships were already visited when preparing the results ... or am I missing something?

UPDATE:

This was just answered via the Neo4j community forum by a Neo4j staff member: https://community.neo4j.com/t/getting-paths-of-any-length-or-long-paths-does-not-work/18298

I wrongly assumed that Cypher would just dynamically switch from the path uniqueness traversal to the node uniqueness traversal just because the operation following the match dealt only with nodes and not with relationships.
Poor assumption on my part - not only Cypher doesn't do it automatically, there is no way AT ALL in core Cypher to drop a path during traversal if all the nodes in the path were aleady visited.

The APOC-based solution was suggested:

match (d:Doc{name: 'doc name'}) 
CALL apoc.path.subgraphNodes(d, {}) YIELD node as n
return d, n

In my case I have disconnected sub-graphs that are tens of thousands of nodes each and are relatively dense. This came up when trying to delete a (:Doc) node and everything that's connected to it before re-loading a new version of the sub-graph into Neo4j:

disconnect delete d, n

I see this task of "removing the old version before re-loading" as a very common operational task for sub-graphs that many people may have in their use cases... Installing and managing additional libraries (like APOC or the Graph Data Science library) seems like an overkill for something this simple... But it's either that or making the deletions more targeted.

1

1 Answers

2
votes

A MATCH clause avoids traversing the same relationship twice, so that would not be the issue. However, it can still travel between the same 2 nodes multiple times (as long as different relationships are used).

The main thing to consider is that variable-length relationship patterns have exponential (time and memory) complexity. If the nodes being traversed have an average of R relevant relationships, then the MATCH clause has to traverse about R**P possible paths of length P. The higher that P gets (especially with no upper bound), the worse it gets. But a high R also hurts.