2
votes

Initial Situation

  • Large Neo4j 3.4.6 graph with a tree-like structure (10 levels deep, 10 million nodes).
  • Unexceptional all nodes are connected with each other. The nodes as well as the relationships are in each case of the same type.
  • Exactly one central root node.
  • Reduced and simplified example:

Graphic representation

CREATE (Root:CustomType {name: 'Root'})
CREATE (NodeA:CustomType {name: 'NodeA'})
CREATE (NodeB:CustomType {name: 'NodeB'})
CREATE (NodeC:CustomType {name: 'NodeC'})
CREATE (NodeD:CustomType {name: 'NodeD'})
CREATE (NodeE:CustomType {name: 'NodeE'})
CREATE (NodeF:CustomType {name: 'NodeF'})
CREATE (NodeG:CustomType {name: 'NodeG'})
CREATE (NodeH:CustomType {name: 'NodeH'})
CREATE (NodeI:CustomType {name: 'NodeI'})
CREATE (NodeJ:CustomType {name: 'NodeJ'})
CREATE (NodeK:CustomType {name: 'NodeK'})
CREATE (NodeL:CustomType {name: 'NodeL'})
CREATE (NodeM:CustomType {name: 'NodeM'})
CREATE (NodeN:CustomType {name: 'NodeN'})
CREATE (NodeO:CustomType {name: 'NodeO'})
CREATE (NodeP:CustomType {name: 'NodeP'})
CREATE (NodeQ:CustomType {name: 'NodeQ'})

CREATE
  (Root)-[:CONTAINS]->(NodeA),
  (Root)-[:CONTAINS]->(NodeB),
  (Root)-[:CONTAINS]->(NodeC),
  (NodeA)-[:CONTAINS]->(NodeD),
  (NodeA)-[:CONTAINS]->(NodeE),
  (NodeA)-[:CONTAINS]->(NodeF),
  (NodeE)-[:CONTAINS]->(NodeG),
  (NodeE)-[:CONTAINS]->(NodeH),
  (NodeF)-[:CONTAINS]->(NodeI),
  (NodeF)-[:CONTAINS]->(NodeJ),
  (NodeF)-[:CONTAINS]->(NodeK),
  (NodeI)-[:CONTAINS]->(NodeL),
  (NodeI)-[:CONTAINS]->(NodeM),
  (NodeJ)-[:CONTAINS]->(NodeN),
  (NodeK)-[:CONTAINS]->(NodeO),
  (NodeK)-[:CONTAINS]->(NodeP),
  (NodeM)-[:CONTAINS]->(NodeQ);

To be solved challenge

  • By means of a MATCH-WITH-UNWIND Cypher query I’m successfully able to select a subtree and bind it to a path. Let’s say the subtree spans over the nodes A,E,F,I and J.
  • Based on this path I need all leaves of the subtree, not the complete tree now.

.

MATCH
  path = (:CustomType {name:'NodeA'})-[:CONTAINS*]->(:CustomType {name:'NodeJ'}) /* simplified */
WITH
  nodes(path) as selectedPath
  /* here: necessary magic to identify the leaf nodes of the subtree */
RETURN
  leafNode;
  • Among other things I tried to solve the requirement with a WHERE NOT(node-->()) approach, but realized this works for leaves of the complete tree only. Unfortunately I was not able to convince the WHERE NOT(node-->()) clause to respect the selected subtree boundaries.
  • So, how can I find all leaves of a selected subgraph with Cypher and Neo4j? Can you please give me an advice how to solve this challenge? Many thanks in advance for pointing me into the right direction!
1

1 Answers

1
votes

You correctly noted that the check node with no children is suitable only for the entire tree. So you need to go through all the relationships in the subtree, and find such a node of the subtree that is as the end of the relationship, but not as the start of the relationship:

MATCH
  path = (:CustomType {name:'NodeA'})-[:CONTAINS*]->(:CustomType {name:'NodeJ'})
UNWIND relationShips(path) AS r
WITH collect(DISTINCT endNode(r))   AS endNodes, 
     collect(DISTINCT startNode(r)) AS startNodes
UNWIND endNodes AS leaf
WITH leaf WHERE NOT leaf IN startNodes
RETURN leaf