0
votes

We have a graph consisting of data sets (~500) and users (~15). When I tried to clear both sets of nodes using the following query, the memory usage of Neo4J (v2.3.1) went up to over 1.5 GB and the query was pretty slow.

MATCH (ds:DataSet), (u:User)
OPTIONAL MATCH (ds)-[r1]-(), (u)-[r2]-()
DELETE ds, u, r1, r2

Surprisingly splitting the query into the following two queries:

MATCH (ds:DataSet) OPTIONAL MATCH (ds)-[r]-() DELETE ds, r
MATCH (u:User) OPTIONAL MATCH (u)-[r]-() DELETE u, r

kept the memory at ~240 MB. The initial memory consumption after starting is at around ~230 MB.

My question is whether there is a conceptual issue with the first cypher query. Is it suppose to be very inefficient to delete multiple sets of nodes at the same time?

tl/dr:

Both node sets (users and data sets) do not overlap but are linked together, i.e. a user node be connected with a data set node via relationships.

2

2 Answers

1
votes

If you profile both queries, you'll find that the first one results in a cartesian product of DataSet and User because they are disconnected patterns at this point (even though they may be related in the underlying graph, the pattern does not express this).

The better performing queries do no such thing- they find nodes by a label scan and delete.

1
votes

Turns out the problem was the cartesian product of (ds)-[r1]-() and (u)-[r2]-().