I'm using Property Graph and Cypher from Neo4j. As described in the title, I'm trying to delete a number of nodes which can be reached from another node without passing other nodes and only have 1 incoming relationship. Here is the example of this case:
Each node has its label (big, bold character) and a property called nodeId
, which is unique among nodes. The labels of relationships are omitted because we cannot rely on it for some reasons. The nodeId
property is already indexed with a unique constraint.
Now, starting from node A {nodeId: 1}
, I want to delete it and all other nodes which:
- can be reached from
A {nodeId: 1}
without passing another A-label node. - only has 1 incoming relationship
So, the nodes will be deleted are: A {nodeId: 1}
, B {nodeId: 3}
, C {nodeId: 4}
, and C {nodeId: 8}
.
Below is my Cypher code:
MATCH p = (s:A {nodeId: 1 }) -[*1..10]-> (e)
WHERE NONE (x in NODES(p) WHERE x:A AND NOT x.nodeId = 1)
WITH s, e
MATCH (e) <-[r]- ()
WITH count(r) AS num_r, s, e
WHERE num_r < 2
DETACH DELETE e
DETACH DELETE s
The code works fine but as my graph grows, it becomes slower and slower. In the beginning, it takes less than 10 ms. But now, when I have around 1 million of nodes and 2 million of relationships, it takes more than 1 second.
What should I do to improve the performance of that code?
Thank you for your help.