I've been trying to import a relatively large dataset into neo4j...approximately 50 million nodes with relationships.
I first experimented with Cypher via py2neo -> which does work, but becomes very slow if you need to use CREATE UNIQUE or MERGE.
I'm now looking at other batch import methods, and I'm wondering if there are recommendations for which of these approaches is the best for general workflow and speed:
- The Neo4j docs mention a batch insertion facility that appears to be Java and is part of the Neo4j distribution;
- There is also the batch inserter by Michael Hunger over at github, I am not sure how similar or different this is from the one included in the distribution;
- Then there is also the load2neo, which I'm currently testing;
- And then there is the load from CSV functionality as part of Neo v2's CYPHER, though I am not sure if it is mainly a convenience factor and if its performance is similar to just executing Cypher queries in batches of, say, 40 000 via a Cypher transaction.
I would greatly appreciate any comments on functionality, workflow, and speed differences between these options.