Cypher/neo4j return paths with a common starting and endpoint, with the endpoint filtered by number of paths

Question

Problem: I want to use cypher to return paths where I can specify the start point, and filter by the number of connections coming into a termination point for those paths.

Some dummy example data:

path1: (a1:a)--(b1:b)--(c1:c)--(d1:d)

path2: (a1:a)--(b2:b)--(c2:c)--(d1:d)

path3: (a1:a)--(b3:b)--(c3:c)--(d2:d)

path4: (a1:a)--(b2:b)--(c2:c)--(d3:d)

path5: (a2:a)--(b4:b)--(c4:c)--(d3:d)

Aim: I want to bring back all paths that start with a1, and end with dn, where the count of relationships to dn from paths that begin at a1 is > 1 (or 2, or 3… in the above example we’ll use >1 but I want to be able to change this for real data where relationship counts might be much higher).

In the example above I want to include path 1 and path 2, because they both start with a1 and terminate at d1, and the count of paths starting at a1 and terminating at d1 is 2 (i.e. >1).

paths 3 and 4 will be excluded because although they start with a1, there are no other paths starting with a1 that terminate in d2 or d3. In other words, d2 and d3 are unique in the context of paths starting with a1.

path 5 will be excluded because it does not start with a1, even although there are >1 paths that terminate at d3

All the intermediate nodes are basically irrelevant, aside from being able to specify their labels during the query and get the nodes that make up the path back at the end.

I have looked but I can't find anything addressing this problem elsewhere

ThirstForKnowledge ThirstForKnowledge · Accepted Answer · 2018-11-28T14:36:33

Your graph

For the ease of possible further answers and solutions I note my graph creating statement:

CREATE
  (a1:LabelA {name: 'A1'})-[:BELONGS_TO]->(b1:LabelB {name: 'B1'})-[:BELONGS_TO]->(c1:LabelC {name: 'C1'})
    -[:BELONGS_TO]->(d1:LabelD {name: 'D1'}),
  (a1)-[:BELONGS_TO]->(b2:LabelB {name: 'B2'})-[:BELONGS_TO]->(c2:LabelC {name: 'C2'})-[:BELONGS_TO]->(d1),
  (a1)-[:BELONGS_TO]->(b3:LabelB {name: 'B3'})-[:BELONGS_TO]->(c3:LabelC {name: 'C3'})
    -[:BELONGS_TO]->(d2:LabelD {name: 'D2'}),
  (c2)-[:BELONGS_TO]->(d3:LabelD {name: 'D3'}),
  (a2:LabelA {name: 'A2'})-[:BELONGS_TO]->(b4:LabelB {name: 'B4'})-[:BELONGS_TO]->(c4:LabelC {name: 'C4'})
    -[:BELONGS_TO]->(d3);

Solution

MATCH path = (:LabelA {name:'A1'})-[:BELONGS_TO*]->(endNode:LabelD) 
WITH endNode, count(endNode) AS endNodeAmount WHERE endNodeAmount > 1 
RETURN endNode.name AS endNode, endNodeAmount;

Underlying idea

first line: define the pattern for an A1 to Dn path of arbitrary length
second line: count for each endNode (Dn) the amount of occurrence and filter for it, the figure 1 can be replaced by a parameter $relationshipAmount
third line: present the findings

Result

╒═════════╤═══════════════╕
│"endNode"│"endNodeAmount"│
╞═════════╪═══════════════╡
│"D1"     │2              │
└─────────┴───────────────┘

Extension: Variant "Paths"

Solution

If you are interested in the paths between node A1 and the identified Dx, you can rely on the following Cypher query.

MATCH path = (startNode:LabelA {name:'A1'})-[:BELONGS_TO*]->(endNode:LabelD) 
WITH collect(path) as paths, endNode WHERE size(paths) > 1 
UNWIND paths as path RETURN path;

(With a bow and thank to @InverseFalcon for the optimization idea.)

Result

╒═══════════════════════════════════════════════╕
│"path"                                         │
╞═══════════════════════════════════════════════╡
│[{"name":"A1"},{},{"name":"B1"},{"name":"B1"},{│
│},{"name":"C1"},{"name":"C1"},{},{"name":"D1"}]│
├───────────────────────────────────────────────┤
│[{"name":"A1"},{},{"name":"B2"},{"name":"B2"},{│
│},{"name":"C2"},{"name":"C2"},{},{"name":"D1"}]│
└───────────────────────────────────────────────┘

Cypher/neo4j return paths with a common starting and endpoint, with the endpoint filtered by number of paths

1 Answers

Your graph

Solution

Underlying idea

Result

Extension: Variant "Paths"

Solution

Result