0
votes

i have the following requirements:

  • Given a source node
  • Find all nodes within a certain range (e.g. 4 hops)
  • and the destination nodes has a special label "x"
  • Restrict the label type of the nodes in the path to the destination
  • return only the shortest path length (e.g. if i find a node with 2 hops, dont find also nodes with 3 or 4 hops)
  • Return all nodes needed for showing the destination nodes and the path between

i managed to create a query but the performance is not so well. i assume it is because of the high amount of nodes with the label "x"

MATCH path = allShortestPaths((source)-[*..4]-(destination))
WHERE source.objectID IN ['001614914']
AND source:Y
AND destination:X
AND ALL(x IN nodes(path)[1..] WHERE any(l in labels(x) WHERE l in ['A', 'B', 'C']))
WITH path
LIMIT 1000
WITH COLLECT(path) AS paths, MIN(length(path)) AS minLength 
WITH FILTER(p IN paths WHERE length(p)= minLength) AS pathList
LIMIT 25
UNWIND pathList as path
WITH [n in nodes(path)] as nodes
return nodes

Profile: Profile with shortest path

if i change the query to use not the shortest path function, this works well when the source has not a lot of outgoing Paths

MATCH path = ((source)-[*..4]-(destination))
WHERE source.objectID IN ['001614914']
AND source:Y
AND destination:X
AND ALL(x IN nodes(path)[1..] WHERE any(l in labels(x) WHERE l in ['A', 'B', 'C']))
WITH path
LIMIT 1000
WITH COLLECT(path) AS paths, MIN(length(path)) AS minLength 
WITH FILTER(p IN paths WHERE length(p)= minLength) AS pathList
LIMIT 25
UNWIND pathList as path
WITH [n in nodes(path)] as nodes
return nodes

Profile: Profile without shortest path

But if i have a source node with many children this has also a bad performance...

in the moment i am thinking if i start with a simple search for all destinations and call a shortestPath on each found destination this might be better but i am not very sure.

e.g.

MATCH (source)-[*..4]-(destination)
WHERE source.objectID IN ['001614914']
AND source:Y
AND destination:X
WITH destination
LIMIT 100
call apoc (shortest path ...)
...

Or is there a better approach?

1

1 Answers

2
votes

You may want to try APOC's path expander using 'NODE_GLOBAL' uniqueness, it typically performs better than a variable-length match. It also has a means for whitelisting nodes during traversal, but this does apply to the start node too, so we'd have to include :Y in the whitelist.

See if this works for you:

MATCH path = (source:Y)
WHERE source.objectID IN ['001614914']
CALL apoc.path.expandConfig(source, {labelFilter:'+A|B|C|Y', maxLevel:4, uniqueness:'NODE_GLOBAL'}) YIELD path
WITH path, last(nodes(path)) as destination
WHERE destination:X AND NONE(node in TAIL(nodes(path)) WHERE node:Y)
// all the rest is the same as your old query
WITH path
LIMIT 1000
WITH COLLECT(path) AS paths, MIN(length(path)) AS minLength 
WITH FILTER(p IN paths WHERE length(p)= minLength) AS pathList
LIMIT 25
UNWIND pathList as path
RETURN NODES(path) as nodes