4
votes

I am trying to remove leaf nodes in Neo4j, but only those with a single incoming relationship. (I'm so close.)

I have a query that returns the exact node I wish to remove. However, when I replace the RETURN with DELETE, it removes more than the query returns. Here's the complete sequence:

neo4j-sh (?)$ match (n)-[r]->(p)  return n, r, p ;
+------------------------------------------------------------+
| n                    | r            | p                    |
+------------------------------------------------------------+
| Node[2164]{name:"a"} | :has[2616]{} | Node[2165]{name:"b"} |
| Node[2164]{name:"a"} | :has[2617]{} | Node[2166]{name:"c"} |
| Node[2166]{name:"c"} | :has[2619]{} | Node[2168]{name:"e"} |
| Node[2167]{name:"d"} | :has[2618]{} | Node[2165]{name:"b"} |
+------------------------------------------------------------+

This query is perfect:

neo4j-sh (?)$ match ()-[r:has]->(n)
>   with n,count(r) as rel_cnt
>   where rel_cnt = 1 and NOT (n)-->()
>   return n.name, rel_cnt;
+------------------+
| n.name | rel_cnt |
+------------------+
| "e"    | 1       |
+------------------+

But this delete removed 2 nodes and 3 relationships?

neo4j-sh (?)$ match ()-[r:has]->(n)
>   with n, r, count(r) as rel_cnt
>   where rel_cnt = 1 and NOT (n)-->()
>   delete n, r;
+-------------------+
| No data returned. |
+-------------------+
Nodes deleted: 2
Relationships deleted: 3

This is all that's left

neo4j-sh (?)$ match (n)-[r]->(p)  return n, r, p ;
+------------------------------------------------------------+
| n                    | r            | p                    |
+------------------------------------------------------------+
| Node[2164]{name:"a"} | :has[2617]{} | Node[2166]{name:"c"} |
+------------------------------------------------------------+
neo4j-sh (?)$ match (n) return n;
+----------------------+
| n                    |
+----------------------+
| Node[2169]{name:"a"} |
| Node[2171]{name:"c"} |
| Node[2172]{name:"d"} |
+----------------------+

Why was node 'b' removed? It didn't show in the query results.

1

1 Answers

3
votes

The queries are actually not identical even apart from RETURN/DELETE. The return query carries n, count(r) to the second query part, the delete query carries n, r, count(r). Try returning the delete query to see that, i.e. run this

neo4j-sh (?)$ match ()-[r:has]->(n)
>   with n, r, count(r) as rel_cnt
>   where rel_cnt = 1 and NOT (n)-->()
//>   delete n, r;
>   return *;

and you'll get something like

+-----------------------------------------------+
| n                    | r            | rel_cnt |
+-----------------------------------------------+
| Node[2165]{name:"b"} | :has[2616]{} | 1       |
| Node[2165]{name:"b"} | :has[2618]{} | 1       |
| Node[2168]{name:"e"} | :has[2619]{} | 1       |
+-----------------------------------------------+

The reason for the different result is that piping n, count(r) means something like "count r per n", and there is only one case where "count r per n = 1". But the other pipe means something like "count r per n and per r" and if you count or group something by itself, then it's going to be one every time. The reason something is not deleted is that it was never matched or is excluded by the other filter critera (NOT (n)-->()), the rel_cnt=1 is rendered useless.


If you want to first count the relationships and then conditionally delete them, you can collect them, filter on collection size, and then delete from the collection. Try something like

MATCH ()-[r:has]->(n)
WITH n, collect(r) as rr
WHERE length(rr) = 1 AND NOT n-->()
FOREACH (r IN rr | DELETE r)
DELETE n