I'm trying to insert a large number of nodes (~500,000) into a (non-embedded) neo4j database by executing cypher commands using the py2neo python module (py2neo.cypher.execute). Eventually I need to remove the dependence on py2neo, but I'm using it at the moment until I learn more about cypher and neo4j.
I have two node types A and B, and the vast majority of nodes are of type A. There are two possible relationships r1 and r2, such that A-[r1]-A and A-[r2]-B. Each node of type A will have 0 - 100 r1 relationships, and each node of type B will have 1 - 5000 r2 relationships.
At the moment I am inserting nodes by building up large CREATE statements. For example I might have a statement
CREATE (:A {uid:1, attr:5})-[:r1]-(:A {uid:2, attr:5})-[:r1]-...
where ... might be another 5000 or so nodes and relationships forming a linear chain in the graph. This works okay, but it's pretty slow. I'm also indexing these nodes using
CREATE INDEX ON :A(uid)
After I've add all the type A nodes, I add the type B nodes using CREATE statements again. Finally, I am trying to add the r2 relationships using a statement like
MATCH c:B, m:A where c.uid=1 AND (m.uid=2 OR m.uid=5 OR ...)
CREATE (m)-[:r2]->(c)
where ... could represent a few thousand OR statements. This seems really slow adding only a few relationships per second.
So, is there a better way to do this? Am I completely off track here? I looked at this question but this doesn't explain how to use cypher to efficiently load the nodes. Everything else I look at seems to use java, without showing the actual cypher queries could be used.