1
votes

looking to understand if anything can be done to make the query below performant on a large graph. I'm trying to find the shortest path between two nodes but exclude paths that include certain other kinds of nodes. The issue seems to be the WHERE clause. The query below just completely grinds to a halt.

MATCH p=shortestPath((p1:Party{suprRC:"21"})-[*..15]-(p2:Party{suprRC:"21"}))
WITH p
WHERE NONE(n in nodes(p) where labels(n) in [["Reporter"],["FirstName"],["LastName"]]) 
RETURN p limit 500;
1
You might try the WHERE directly on the MATCH (leave the WITH p out in other words).Tom Geudens
Thanks for the suggestion Tom - I'm afraid it didn't help though.Steve Vejcik
Couple of questions ... 1) the starting node and the end node are ... the same ? Is suprRC a unique property for a Party ? If not, are the Party nodes indexed on suprRC (I know that should be obvious, I'm just excluding options here) ? When you execute just the MATCH (with the RETURN and LIMIT but without WITH and WHERE) ... is that performant ? Can you share the output of a EXPLAIN of this query ?Tom Geudens
Hi Tom - 1) The starting node and end node are not the same. suprRC is not unique for a party. I have not indexed the party nodes yet - I know I should, but since the query dies when I add the piece filtering the paths for node labels I felt reasonably confident that this wasn't the issue. When I omit the WHERE and WITH pieces it is very performant. In fact, the last two node exclusions, in particular 'FirstName', are what kill the performance.Steve Vejcik
Looks like I can't add an image to SO quite yet. It's short enough, here's my ascii version of it:Steve Vejcik

1 Answers

0
votes

The comparison is wrong, labels(n) returns a collection of labels and you will match a collection against a collection of single String elements.

As mentioned by Tom, you can leave off the WITH but you will have to use two NONE predicates

MATCH p=shortestPath((p1:Party{suprRC:"21"})-[*..15]-(p2:Party{suprRC:"21"}))
WHERE NONE(x IN nodes(p) 
             WHERE NONE(l IN ['Reporter','LastName', 'FirstName'] 
                      WHERE l IN labels(x)
                      )
          )
RETURN p limit 500;