0
votes

I'm running Neo4J, community edition v2.3.2 on a server with 32GB memory, configured to use 8GB as pagecache, 18GB as heap and the other 6GB for the OS (from my understanding of Neo4J, this mostly goes for page cache for the index files).

My DB has around 250M Person nodes and 1B relations and I'm running the following query to delete some nodes:

MATCH (p:Person) 
WHERE NOT HAS (p.fullname) 
OPTIONAL MATCH (prev:Person {fullname: p.fname +' '+p.lname}) 
WHERE prev IS NOT NULL 
WITH p LIMIT 100000 
DETACH DELETE p

I've been playing a lot with this to get it to actually work without hogging all memory and crashing the server. I must say I'm a bit disappointed that I can only delete 100K nodes at a time. Expected something like a couple of millions, but thats probably for a different question.

I constantly get the following error:

ERROR (-v for expanded information):
        TransactionFailureException: Transaction was marked as successful, but unable to commit transaction so rolled back.
org.neo4j.graphdb.TransactionFailureException: Transaction was marked as successful, but unable to commit transaction so rolled back.
    at org.neo4j.kernel.TopLevelTransaction.close(TopLevelTransaction.java:121)
    at org.neo4j.shell.kernel.apps.TransactionProvidingApp.execute(TransactionProvidingApp.java:263)
    at ...

Caused by: org.neo4j.kernel.impl.store.InvalidRecordException: RelationshipRecord[250054319] not in use
    at org.neo4j.kernel.impl.store.RelationshipStore.fillRecord(RelationshipStore.java:135)
    at org.neo4j.kernel.impl.store.RelationshipStore.getRecord(RelationshipStore.java:93)
    at org.neo4j.kernel.impl.store.RelationshipStore.getRecord(RelationshipStore.java:88)
    at org.neo4j.kernel.impl.transaction.state.Loaders$3.load(Loaders.java:139)
    at ... 

I couldn't find any relationship with ID of 250054319. I could find a node with that ID but I'm not sure it has anything to do with my query as it doesn't satisfy my initial condition (NOT HAS p.fullname).

I'm confused. Does anyone know this error? What does it mean and what should I do to solve it?

Maybe helpful to say that I'm running this deletion as a one-time task and therefore open to creative ways of making this query run.

Another thing that might help is that I had a lot of memory problems, even after breaking the query with the LIMIT clause. The server hanged, crashed or got into constant GC cycles until I was able to find this configuration that allowed it to work a bit until I start experiencing this problem.

Thanks!

1
Would be great if you could include the full stack trace in the question alsoMattias Finné

1 Answers

0
votes

I have 2 comments:

  1. Your query is not doing what you intended. It will delete all Person nodes that do not have a fullname property, even if there is no matching prev node. This is because the OPTIONAL MATCH clause (and the subsequent WHERE) do not have to match in order for the query to proceed to the DELETE clause.

  2. You are probably getting a lot of errors because DETACH DELETE p will attempt to delete all relationships for each p, even if any of those relationships had already been deleted. For example, if 2 people had a relationship between them, then the attempt to DETACH DELETE the second person would cause an error.

Here is a query that might solve both of these problems:

MATCH (p:Person)
WHERE NOT HAS (p.fullname)
OPTIONAL MATCH (prev:Person { fullname: p.fname +' '+p.lname })
WITH CASE WHEN prev IS NOT NULL THEN p ELSE NULL END AS delP
LIMIT 100000
OPTIONAL MATCH (delP)-[r]-()
WITH COLLECT(delP) AS ps, COLLECT(DISTINCT r) AS rels
FOREACH (r IN rels | DELETE r)
FOREACH (p IN ps | DELETE p)