0
votes

I have a database populated with about 81MB of CSV data.

The data has some implicit relationships that I wanted to explicitly create, so I ran the following command:

with range(0,9) as numbers 
unwind numbers as n
match (ks:KbWordSequence) where ks.kbid ends with tostring(n)
match (kt:KbTextWord {kbid: ks.kbid})
create (kt)-[:SEQUENCE]->(ks)
create (ks)-[:TEXT]->(kt)

On running the code I started to see lots of these messages in the .log file:

2016-03-19 19:27:30.740+0000 WARN  [o.n.k.i.c.MonitorGc] GC Monitor: Application threads blocked for 9149ms.

After seeing these GC messages for a while, and seeing the process take up 6G of RAM, I killed the windows process and went to try creating the relationship again.

When I did that I got the following error and the database wouldn't start.

Starting Neo4j failed: Component 'org.neo4j.server.database.LifecycleManagingDatabase@1dc6ce1' was successfully initialized, but failed to start. Please see attached cause exception.

There's no error in the .log file or any other corresponding message I can see.

Other examples of this kind of error corresponded to a Neo4j db version mismatch, which isn't the case in my situation.

How would I recover from this condition?

1

1 Answers

0
votes

I guess the transaction grows too large since this statement seems to trigger a global operation. First understand the size of the intended operation:

with range(0,9) as numbers 
unwind numbers as n
match (ks:KbWordSequence) where ks.kbid ends with tostring(n)
match (kt:KbTextWord {kbid: ks.kbid})
return count(*)

As a rule of thumb ~ 10k to 100k atomic operations is a good transaction size. With that in mind apply skip and limit to control the transaction size:

with range(0,9) as numbers 
unwind numbers as n
match (ks:KbWordSequence) where ks.kbid ends with tostring(n)
match (kt:KbTextWord {kbid: ks.kbid})
with ks, kt skip 0 limit 50000
create (kt)-[:SEQUENCE]->(ks)
create (ks)-[:TEXT]->(kt)
return count(*)

and run this statement couple of times until you get back a value of 0.

Depending on the actual use case there might be even more efficient approaches in a way to prevent usage of skip and detect the not yet processed nodes directly in the match.