Trying to find all possible paths (even if up to length of 20) in a graph of millions of nodes is likely to cause a memory overflow.
What you can do is break it down into smaller segments. The query won't be as elegant, but it should work.
For example, if we are doing a path length of 5 at a time, a two-segment query will look like this:
MATCH p1 = (n1:Label)-[r1*..5]-(n2:Label), p2 = (n2:Label)-[r2*..5]-(n3:Label)
WHERE all(x1 in nodes(p1) WHERE (x1:Label))
AND all(x2 in nodes(p2) WHERE (x2:Label))
RETURN r1, r2
The cost plan for this query looks like so:
+-----------------------+----------------------+---------------------+
| Operator | Variables | Other |
+-----------------------+----------------------+---------------------+
| +ProduceResults | r1 | r1 |
| | +----------------------+---------------------+
| +Filter | n1, n2, n3, r1, r2 | [see below] |
| | +----------------------+---------------------+
| +VarLengthExpand(All) | n1, r1 -- n2, n3, r2 | (n2)-[r1:*..5]-(n1) |
| | +----------------------+---------------------+
| +Filter | n2, n3, r2 | n3:Label |
| | +----------------------+---------------------+
| +VarLengthExpand(All) | n3, r2 -- n2 | (n2)-[r2:*..5]-(n3) |
| | +----------------------+---------------------+
| +NodeByLabelScan | n2 | :Label |
+-----------------------+----------------------+---------------------+
So you can see that straight after the first expand a filter will filter any paths that don't start and end with :Label
and only then the second expand will happen.
So long your Neo version is 2.2 or above, p1
and p2
will not include the same relationships.
You can actually see this filtering done in the filter preceding the ProduceResults
operator (second line):
all(x1 in NodesFunction(ProjectedPath(Set(r1, n1),)) where x1:Label)
AND none(r1 in r1 where any(r2 in r2 where r1 == r2))
AND n1:Label
Now you should also see that the we only check for all nodes in the path on that last filter. So a path like this: (a:Label)--(b:Blah)--(c:Label) will still past the first segment and only be filtered before the result production.
So you can further optimise by checking that all segment nodes have :Label
and then do a manual check for relationship similar past relationships. Only showing the second stage:
WITH n2, r1
MATCH p2 = (n2:Label)-[r2*..5]-(n3:Label)
WHERE all(x2 in nodes(p2) WHERE (x2:Label))
AND none(r1 in r1 where any(r2 in r2 where r1 == r2))
I forgot to mention, but remember that queries as such are executed in a lazy manner.
r:REL_TYPE1|REL_TYPE2
(listing the possible relationship types) to theMATCH
2. check out the APOC library's path expander. – Gabor Szarnyas