3
votes

I've been trying to import a relatively large dataset into neo4j...approximately 50 million nodes with relationships.

I first experimented with Cypher via py2neo -> which does work, but becomes very slow if you need to use CREATE UNIQUE or MERGE.

I'm now looking at other batch import methods, and I'm wondering if there are recommendations for which of these approaches is the best for general workflow and speed:

  • The Neo4j docs mention a batch insertion facility that appears to be Java and is part of the Neo4j distribution;
  • There is also the batch inserter by Michael Hunger over at github, I am not sure how similar or different this is from the one included in the distribution;
  • Then there is also the load2neo, which I'm currently testing;
  • And then there is the load from CSV functionality as part of Neo v2's CYPHER, though I am not sure if it is mainly a convenience factor and if its performance is similar to just executing Cypher queries in batches of, say, 40 000 via a Cypher transaction.

I would greatly appreciate any comments on functionality, workflow, and speed differences between these options.

1
For what its worth, I'm currently using Nigel Small's load2neo and it appears to work well and fast. In my mind, the geoff formatted text file is easier to construct than the CSV file with all properties as column headers, etc. - songololo
I'm thinking about using load2neo instead of the CSV batch importers. Are they similar in performance? - Lucas Azevedo

1 Answers

1
votes

If you can use the latest version of Neo4j the recommended way is to use the new LOAD CSV statement in Cypher: http://docs.neo4j.org/chunked/stable/cypherdoc-importing-csv-files-with-cypher.html