I'm evaluating using Neo4J Community 2.1.3 to store a list of concepts and relationships between them. I'm trying to load my sample test data (CSV files) into Neo4J using Cypher from the Web interface , as described in the online manual.
My data looks something like this:
concepts.csv
id,concept
1,tree
2,apple
3,grapes
4,fruit salad
5,motor vehicle
6,internal combustion engine
relationships.csv
sourceid,targetid
2,1
4,2
4,3
5,6
6,5
And so on... For my sample, I have ~17K concepts and ~16M relationships. Following the manual, I started Neo4J server, and entered this into Cypher:
LOAD CSV WITH HEADERS FROM "file:///data/concepts.csv" AS csvLine
CREATE (c:Concept { id: csvLine.id, concept: csvLine.concept })
This worked fine and loaded my concepts. Then I tried to load my relationships.
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "file:///data/relationships.csv" AS csvLine
MATCH (c1:Concept { id: csvLine.sourceid }),(c2:Concept { id: csvLine.targetid })
CREATE (c1)-[:RELATED_TO]->(c2)
This would run for an hour or so, but always stopped with either:
- "Unknown error" (no other info!), or
- "Neo.TransientError.Transaction.DeadlockDetected" with a detailed message like "LockClient[695] can't wait on resource RWLock[RELATIONSHIP(572801), hash=267423386] since => LockClient[695] <-[:HELD_BY]- RWLock[NODE(4145), hash=1224203266] <-[:WAITING_FOR]- LockClient[691] <-[:HELD_BY]- RWLock[RELATIONSHIP(572801), hash=267423386]"
It would stop after loading maybe 200-300K relationships. I've done a "sort | uniq" on the relationships.csv so I'm pretty sure there are no duplicates. I looked at the log files in data/log but found no error message.
Has anyone seen this before? BTW, I don't mind losing a small portion of the relationships, so I'll be happy if I can just turn off ACID transactions. I also want to avoid writing code (to use the Java API) at this stage. I just want to load up my data to try it out. Is there anyway to do this?
My full data set will have millions of concepts and maybe hundreds of millions of relationships. Does anyone know if Neo4J can handle this amount of data?
Thank you.