2
votes

This is unrelated to how to find all the longest paths with cypher query?, and somewhat related to the answer in Find all relationship disjoint longest paths in cypher/traversal API ordered by size except my condition is a bit different:

I would like to write a cypher query that returns paths whose collections of nodes are "unique", in the sense that no two nodes in the path share the same name property.

Here's an example graph: enter image description here

and the cypher to make it:

CREATE (a1:Node {name: 'A', time:1}),
(c1:Node {name: 'C', time:2}),
(b1:Node {name: 'B', time:3}),
(d1:Node {name: 'D', time:4}),
(c2:Node {name: 'C', time:5}),
(a2:Node {name: 'A', time:6}),
(a3:Node {name: 'A', time:7}),
(b2:Node {name: 'B', time:8}),
(d2:Node {name: 'D', time:9})

CREATE (a1)-[:NEXT]->(b1)-[:NEXT]->(c2)-[:NEXT]->(a2)-[:NEXT]->(b2),
(a2)-[:NEXT]->(a3)-[:NEXT]->(d2),
(c1)-[:NEXT]->(b1),
(d1)-[:NEXT]->(c2)

RETURN *

In this graph, the following paths are considered longest, and unique:

(a1)-->(b1)-->(c2)
(c1)-->(b1)
(d1)-->(c2)-->(a2)-->(b2)
(a3)-->(d2)

Some example of paths that should not be returned from the query are

  1. (c1)-->(b1)-->(c2) since it contains two instances of nodes with name, "C"
  2. (a2)-->(b2) since it is contained in the larger path (d1)-->(c2)-->(a2)-->(b2)

Any help would be much appreciated.

1

1 Answers

4
votes
//
// Find all path:
MATCH path = (a:Node)-[:NEXT*1..]->(b:Node)
//
// Check that the names of the nodes in a path is unique
UNWIND nodes(path) as node
WITH path, nodes(path) as nodes, 
     collect(DISTINCT node.name) as unames
  //
  // And sort by path length
  ORDER BY length(path) ASC
  WHERE length(path)+1 = size(unames)
//
// Putting the appropriate path to the collection
WITH collect({f: id(HEAD(nodes)), // Fist node in path
              l: id(LAST(nodes)), // Last node in path
              ns: EXTRACT(n in nodes(path) | id(n)), 
              p: path
     }) + {f: NULL, l: NULL, ns: [NULL]} as paths
//
// Loop through the paths in a double loop 
// and check that the start and end nodes 
// are not included in the following ascending path
UNWIND RANGE(0,size(paths)-2) as i
  UNWIND RANGE(i+1,size(paths)-1) as j
    WITH i, paths, paths[i]['p'] as path, 
         sum( size( FILTER(n in paths[j]['ns'] WHERE n=paths[i]['f']) )) as fc, 
         sum( size( FILTER(n in paths[j]['ns'] WHERE n=paths[i]['l']) )) as fl
    WHERE fl=0 OR fc=0
//
// Return all appropriate ways
RETURN path ORDER BY length(path)

Upd: (let's try to add elegance)

//
// Find all path:
MATCH path = (a:Node)-[:NEXT*1..]->(b:Node)
//
// Check that the names of the nodes in a path is unique
UNWIND nodes(path) as node
WITH a, b, path,
     collect(DISTINCT node.name) as unames
  //
  // And sort by path length
  ORDER BY length(path) ASC
  WHERE length(path)+1 = size(unames)
//
// Putting the appropriate path and first and last nodes to the collection
WITH collect({first: a, last: b, path: path}) + {} as paths
//
// Loop through the paths in a double loop 
// and check that the start and end nodes 
// are not included in the following ascending path
UNWIND RANGE(0,size(paths)-2) as i
  WITH i, paths[i]['path'] as path, paths
    UNWIND RANGE(i+1,size(paths)-1) as j
      WITH path,
           collect(distinct paths[i]['first'] IN nodes(paths[j]['path'])) as cFirst,
           collect(distinct paths[i]['last']  IN nodes(paths[j]['path'])) as cLast
        WHERE (size(cFirst)=1 AND cFirst[0] = false) OR
              (size(cLast )=1 AND cLast [0] = false)
//
// Return all appropriate ways
RETURN path ORDER BY length(path)