3
votes

my problem is the following. I have a small but dense network in Neo4j (~280 nodes, ~3600 relationships). There is only one type of node and one type of edge (i.e. a single label for each). Now, I'd like to specify two distinct groups of nodes, given by values for their "group" property, and match the subgraph consisting of all paths up to a certain length connecting the two groups. In addition I would like to add constraints on the relations. So, at the moment I have this:

MATCH (n1) WHERE n1.group={group1} 
MATCH (n2) WHERE n2.group={group2}
MATCH p=(n1)-[r*1..3]-(n2) 
WHERE ALL(c IN r WHERE c.weight > {w}) 
AND ALL(n in NODES(p) WHERE 1=length(filter(m in NODES(p) WHERE m=n))) 
WITH DISTINCT r AS dr, NODES(p) AS ns 
UNWIND dr AS udr UNWIND ns AS uns 
RETURN COLLECT(DISTINCT udr), COLLECT(DISTINCT uns)

which achieves what I want but in some cases seems to be too slow. Here the WHERE statement filters out paths with relationships whose weight property is below a threshold as well as those containing cycles.

The last three lines have to do with the desired output format. Given the matching subgraph (paths), I want all unique relationships in one list, and all unique nodes in another (for visualization with d3.js). The only way I found to do this is to UNWIND all elements and then COLLECT them as DISTINCT.

Also note that the group properties and the weight limit are passed in as query parameters.

Now, is there any way to achieve the same result faster? E.g., with paths up to a length of 3 the query takes about 5-10 seconds on my local machine (depending on the connectedness of the chosen node groups), and returns on the order of ~50 nodes and a few hundred relationships. This seems to be in reach of acceptable performance. Paths up to length 4 however are already prohibitive (several minutes or never returns).

Bonus question: is there any way to specify the upper limit on path length as a parameter? Or does a different limit imply a totally different query plan?

1
I would be tempted to add labels to the query even though you only have one set of nodes. Neo4j is setup to optimize with labels; might buy you something. As well, can you reduce your 3 MATCH statements into one? (e.g. something like this MATCH p=(n1:Label)-[r:CONNECTED*1..3]-(n2:Label) WHERE n1.group ={group1} AND n2.group = {group2} )Dave Bennett
I'm actually using node labels already in the real query, which I have left out here for conciseness. According to the browser console there is no significant difference though with or without labels in this case. The 3 separate matches in contrast are an optimization already. It's much faster than a single match statement. I don't really know how it works under the hood, but the reason seems to be that after the first two matches the more expensive relationship pattern is applied to a much reduced results set.thomas

1 Answers

0
votes

This probably won't work at all, but it might give you something to play with. I tried changing a few things that may or may not work.

MATCH (n1) WHERE n1.group={group1} 
MATCH (n2) WHERE n2.group={group2}
MATCH p=(n1)-[r*1..3]-(n2) 
WHERE r.weight > {w}
WITH n1, NODES(p) AS ns, n2, DISTINCT r AS dr
WHERE length(ns) = 1
UNWIND dr AS udr UNWIND ns AS uns 
RETURN COLLECT(DISTINCT udr), COLLECT(DISTINCT uns)