I'm running Neo4J, community edition v2.3.2 on a server with 32GB memory, configured to use 8GB as pagecache, 18GB as heap and the other 6GB for the OS (from my understanding of Neo4J, this mostly goes for page cache for the index files).
My DB has around 250M Person nodes and 1B relations and I'm running the following query to delete some nodes:
MATCH (p:Person)
WHERE NOT HAS (p.fullname)
OPTIONAL MATCH (prev:Person {fullname: p.fname +' '+p.lname})
WHERE prev IS NOT NULL
WITH p LIMIT 100000
DETACH DELETE p
I've been playing a lot with this to get it to actually work without hogging all memory and crashing the server. I must say I'm a bit disappointed that I can only delete 100K nodes at a time. Expected something like a couple of millions, but thats probably for a different question.
I constantly get the following error:
ERROR (-v for expanded information):
TransactionFailureException: Transaction was marked as successful, but unable to commit transaction so rolled back.
org.neo4j.graphdb.TransactionFailureException: Transaction was marked as successful, but unable to commit transaction so rolled back.
at org.neo4j.kernel.TopLevelTransaction.close(TopLevelTransaction.java:121)
at org.neo4j.shell.kernel.apps.TransactionProvidingApp.execute(TransactionProvidingApp.java:263)
at ...
Caused by: org.neo4j.kernel.impl.store.InvalidRecordException: RelationshipRecord[250054319] not in use
at org.neo4j.kernel.impl.store.RelationshipStore.fillRecord(RelationshipStore.java:135)
at org.neo4j.kernel.impl.store.RelationshipStore.getRecord(RelationshipStore.java:93)
at org.neo4j.kernel.impl.store.RelationshipStore.getRecord(RelationshipStore.java:88)
at org.neo4j.kernel.impl.transaction.state.Loaders$3.load(Loaders.java:139)
at ...
I couldn't find any relationship with ID of 250054319
. I could find a node with that ID but I'm not sure it has anything to do with my query as it doesn't satisfy my initial condition (NOT HAS p.fullname
).
I'm confused. Does anyone know this error? What does it mean and what should I do to solve it?
Maybe helpful to say that I'm running this deletion as a one-time task and therefore open to creative ways of making this query run.
Another thing that might help is that I had a lot of memory problems, even after breaking the query with the LIMIT clause. The server hanged, crashed or got into constant GC cycles until I was able to find this configuration that allowed it to work a bit until I start experiencing this problem.
Thanks!