cypher query to return or keep only the final sequence when variable length relationship identifiers are used

Question

Is there a way to keep or return only the final full sequences of nodes instead of all subpaths when variable length identifiers are used in order to do further operations on each of the final full sequence path.

MATCH path = (S:Person)-[rels:NEXT*]->(E:Person)................

eg: find all sequences of nodes with their names in the given list , say ['graph','server','db'] with same 'seqid' property exists in the relationship in between.

i.e.

(graph)->(server)-(db) with same seqid :1

(graph)->(db)->(server) with same seqid :1 //there can be another matching
sequence with same seqid

(graph)->(db)->(server) with same seqid :2

Is there a way to keep only the final sequence of nodes say ' (graph)->(server)->(db)' for each sequences instead of each of the subpath of a large sequence like (graph)->(server) or (server)->(db)

pls help me to solve this.........

(I am using neo4j 2.3.6 community edition via java api in embedded mode..)

Can you share the Cypher queries that are matching on all sequences? — InverseFalcon
Also, do you have any requirements on what to do if the same endpoint node can be reached via multiple valid paths? Also I'm not seeing S being treated as some unique start node, it's matching on all :Person nodes. Is this kind of query meant to be applied to all :Persons, or from a single (or group) of start nodes? — InverseFalcon
Is it fair to assume that you want your starting node to not have an incoming :NEXT relationship? And if not, what kind of criteria should be used to determine the starting node of a sequence? — InverseFalcon
No need to give a starting node. Retrieve all sequences of nodes with their names in the given list and have same 'seqid' in the relationship property in between. MATCH p=(a)-[rels:NEXT*]->(b) WHERE ALL(n in nodes(p) WHERE n.name in ['graph', 'server' ,'db']) and ALL( r in rels WHERE rels[0]['seqid'] = r.seqid )return p --------------> this query will return all sequences with same 'seqid '. But I only want to have the final sequence of nodes for each of the sequences with same 'seqid' instead of each of the sub paths . — Soumya George
The only way I know to do this currently is by defining the characteristics of your start and end nodes (so they can't traverse before the start node or beyond the end node without breaking predicates) then finding the paths between them. Or, to avoid cartesian product, define start nodes first, then match with variable length relationships with predicates, adding an additional predicate on your end node of the pattern so it can only match on the end node of the sequence and not earlier. — InverseFalcon

InverseFalcon InverseFalcon · Accepted Answer · 2016-08-20T18:36:22

What we could really use here is a longestSequences() function that would do exactly what you want it to do, expand the pattern such that a and b would always be matched to start and end points in the sequence such that the pattern is not a subset of any other matched pattern.

I created a feature request on neo4j for exactly this: https://github.com/neo4j/neo4j/issues/7760

And until that gets implemented, we'll have to make do with some alternate approach. I think what we'll have to do is add additional matching to restrict a and b to start and end nodes of full sequences.

Here's my proposed query:

WITH ['graph', 'server' ,'db'] as names
MATCH p=(a)-[rels:NEXT*]->(b) 
WHERE ALL(n in nodes(p) WHERE n.name in names) 
AND ALL( r in rels WHERE rels[0]['seqid'] = r.seqid ) 
WITH names, p, a, rels, b
// check if b is a subsequence node instead of an end node
OPTIONAL MATCH (b)-[rel:NEXT]->(c)
WHERE c.name in names
AND rel.seqid = rels[0]['seqid']
// remove any existing matches where b is a subsequence node
WITH names, p, a, rels, b, c
WHERE c IS NULL
WITH names, p, a, rels, b
// check if a is a subsequence node instead of a start node
OPTIONAL MATCH (d)-[rel:NEXT]->(a)
WHERE d.name in names
AND rel.seqid = rels[0]['seqid']
// remove any existing matches where a is a subsequence node
WITH p, a, b, d
WHERE d IS NULL
RETURN p, a as startNode, b as endNode

cypher query to return or keep only the final sequence when variable length relationship identifiers are used

3 Answers