1
votes

I'm evaluating using Neo4J Community 2.1.3 to store a list of concepts and relationships between them. I'm trying to load my sample test data (CSV files) into Neo4J using Cypher from the Web interface , as described in the online manual.

My data looks something like this:

concepts.csv

id,concept
1,tree
2,apple
3,grapes
4,fruit salad
5,motor vehicle
6,internal combustion engine

relationships.csv

sourceid,targetid
2,1
4,2
4,3
5,6
6,5

And so on... For my sample, I have ~17K concepts and ~16M relationships. Following the manual, I started Neo4J server, and entered this into Cypher:

LOAD CSV WITH HEADERS FROM "file:///data/concepts.csv" AS csvLine 
CREATE (c:Concept { id: csvLine.id, concept: csvLine.concept })

This worked fine and loaded my concepts. Then I tried to load my relationships.

USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "file:///data/relationships.csv" AS csvLine
MATCH (c1:Concept { id: csvLine.sourceid }),(c2:Concept { id: csvLine.targetid })
CREATE (c1)-[:RELATED_TO]->(c2)

This would run for an hour or so, but always stopped with either:

  • "Unknown error" (no other info!), or
  • "Neo.TransientError.Transaction.DeadlockDetected" with a detailed message like "LockClient[695] can't wait on resource RWLock[RELATIONSHIP(572801), hash=267423386] since => LockClient[695] <-[:HELD_BY]- RWLock[NODE(4145), hash=1224203266] <-[:WAITING_FOR]- LockClient[691] <-[:HELD_BY]- RWLock[RELATIONSHIP(572801), hash=267423386]"

It would stop after loading maybe 200-300K relationships. I've done a "sort | uniq" on the relationships.csv so I'm pretty sure there are no duplicates. I looked at the log files in data/log but found no error message.

Has anyone seen this before? BTW, I don't mind losing a small portion of the relationships, so I'll be happy if I can just turn off ACID transactions. I also want to avoid writing code (to use the Java API) at this stage. I just want to load up my data to try it out. Is there anyway to do this?

My full data set will have millions of concepts and maybe hundreds of millions of relationships. Does anyone know if Neo4J can handle this amount of data?

Thank you.

1

1 Answers

0
votes

You're doing it correctly. Do you use the neo4j-shell or the browser?

Did you do: create index on :Concept(id);?

If you don't have an index, searching for the concepts will take exponentially longer, as it has to scan all nodes of this label for this id-value. You should / could also check via prefixing your query with PROFILE if it uses an index for both matches.

Never seen that deadlock before despite importing millions of relationships. Can you share the full stack trace? If you use shell, you might want to do export STACKTRACES=true

Can you use USING PERIODIC COMMIT 1000 ?