Find and count all possible paths starting from one node not exceeding specific length in Cypher Neo4j

Question

I have a graph with several nodes connected with two possible relationships: NextExp and PrevExp. Each relationship has a specific ID, so it is possible to have multiple relationships between two nodes.

   ID1      ID1     ID1  
SD ---> SSD ---> LD ---> CEO 
   ID2      ID2
SD ---> SSD ---> LD
   ID2      ID2
SD ---> SSD ---> VF
            ID2
        SSD ---> BO
   ID3      ID3     ID3
SD ---> ST  ---> BO ---> CTO
   ID4      ID4
SD ---> ST  ---> MD
            ID5
        ST  ---> BB

I want to find (actually count) all possible paths with specific lengths starting from node SD, counting only a specific relationship (i.e. NextExp for instance). For the example above, if the max length is 3, I would like to have something like this:

Length   |       Paths       |   Count
2         SD --> SSD           2
          SD --> ST            2
3         SD --> SSD --> LD    2
          SD --> SSD --> VF    1
          SD --> ST  --> BO    1
          SD --> ST  --> MD    1

Being new in Neo4j, I tried using MATCH but could not find how to not specify an ending node when searching for paths.

In general something like: MATCH path = (start)-[*..3]-() UNWIND rels(path) as rel WITH path, rel WHERE type(rel) = 'NextExp' RETURN path, count(*), length(path) — Michael Hunger
Do "SD", "SSD', "ST, "LD", "CTO" etc represent labels of nodes? Also your sample output doesn't make much sense. If max length is 3 (as in 3 hops or relationships) then where are the CTO or CEO nodes? It's hard to tell where these counts are coming from. An actual example graph with some Cypher to create it would help make your issue clearer. — InverseFalcon

Pablissimo Pablissimo · Accepted Answer · 2019-08-29T16:24:38

I might be overcomplicating this, but given some sample data that looks close to your diagram:

MERGE (sd: Node { name: 'SD' })
MERGE (ssd: Node { name: 'SSD' })
MERGE (ld: Node { name : 'LD' })
MERGE (ceo: Node { name: 'CEO' })
MERGE (cto: Node { name: 'CTO' })
MERGE (st: Node { name: 'ST' })
MERGE (vf: Node { name: 'VF' })
MERGE (bo: Node { name: 'BO' })
MERGE (md: Node { name: 'MD' })
MERGE (bb: Node { name: 'BB' })
MERGE (sd)-[:NEXTEXP { id: 1 }]->(ssd)
MERGE (ssd)-[:NEXTEXP { id: 1 }]->(ld)
MERGE (ld)-[:NEXTEXP { id: 1 }]->(ceo)
MERGE (sd)-[:NEXTEXP { id: 2 }]->(ssd)
MERGE (ssd)-[:NEXTEXP { id: 2 }]->(ld)
MERGE (ssd)-[:NEXTEXP { id: 2 }]->(vf)
MERGE (ssd)-[:NEXTEXP { id: 2 }] ->(bo)
MERGE (sd)-[:NEXTEXP { id: 3 }]->(st)
MERGE (st)-[:NEXTEXP { id: 3 }]->(bo)
MERGE (bo)-[:NEXTEXP { id: 3 }]->(cto)
MERGE (sd)-[:NEXTEXP { id: 4 }]->(st)
MERGE (st)-[:NEXTEXP { id: 4 }]->(md)
MERGE (st)-[:NEXTEXP { id: 5 }]->(bb)

The following appears to yield the right sort of answer:

 MATCH path=(s: Node { name: 'SD' })-[:NEXTEXP*1..2]->(other: Node)
  WITH distinct nodes(path) as pathNodes, 
       relationships(path) as pathRels, 
       reduce(minValue = head(relationships(path)).id, x in relationships(path) | (case when x.id = minValue then x.id else null end)) as pathRelIds
 WHERE pathRelIds IS NOT NULL
  WITH pathNodes, count(pathNodes) as ct, size(pathNodes) as pathLength
RETURN pathLength as `Length`, pathNodes as `Paths`, ct as `Count`
 ORDER BY `Length`, `Paths`, `Count`

The query assumes that you need the ID of the relationships to be the same throughout the path. This seemed to be what you're asking since in your 'expected' table SD -> SSD -> LD has a count of 2, rather than 3.

Breaking it down a bit:

We match for between 1 and 2 relationships and assign the resulting paths to a variable called path
The reduce call in the second line tries to return either the smallest relationship ID within the path, or NULL if there are multiple relationship IDs - it's a bit of a cheat that relies on how Cypher treats NULL in expressions
The third line eliminates those cases where the previous reduce returned NULL - that is, we eliminate paths where there's more than one ID on the relationships in the path

For the exact output you're asking for (where the names of nodes are joined into a single string), the following amended query:

 MATCH path=(s: Node { name: 'SD' })-[:NEXTEXP*1..2]->(other: Node)
  WITH distinct s, 
       nodes(path) as pathNodes, 
       relationships(path) as pathRels, 
       reduce(minValue = head(relationships(path)).id, x in relationships(path) | (case when x.id = minValue then x.id else null end)) as pathRelIds
 WHERE pathRelIds IS NOT NULL
  WITH s, pathNodes, count(pathNodes) as ct, size(pathNodes) as pathLength
RETURN pathLength as `Length`, 
       reduce(pathStr = s.name, x in pathNodes[1..] | pathStr + ' --> ' + x.name) as `Paths`, 
       ct as `Count`
 ORDER BY `Length`, `Count` DESC, `Paths`

Yields the following output:

╒════════╤═══════════════════╤═══════╕
│"Length"│"Paths"            │"Count"│
╞════════╪═══════════════════╪═══════╡
│2       │"SD --> SSD"       │2      │
├────────┼───────────────────┼───────┤
│2       │"SD --> ST"        │2      │
├────────┼───────────────────┼───────┤
│3       │"SD --> SSD --> LD"│2      │
├────────┼───────────────────┼───────┤
│3       │"SD --> SSD --> BO"│1      │
├────────┼───────────────────┼───────┤
│3       │"SD --> SSD --> VF"│1      │
├────────┼───────────────────┼───────┤
│3       │"SD --> ST --> BO" │1      │
├────────┼───────────────────┼───────┤
│3       │"SD --> ST --> MD" │1      │
└────────┴───────────────────┴───────┘

Which differs from your expected only by the SD --> SSD --> BO row, which is probably due to me misinterpreting your diagram.

Find and count all possible paths starting from one node not exceeding specific length in Cypher Neo4j

1 Answers