I have several CSV files that range from 25-100 MB in size. I have created constraints, created indices, am using periodic commit, and increased the allocated memory in the neo4j-wrapper.conf and neo4j.properties.
neo4j.properties:
neostore.nodestore.db.mapped_memory=50M
neostore.relationshipstore.db.mapped_memory=500M
neostore.propertystore.db.mapped_memory=100M
neostore.propertystore.db.strings.mapped_memory=100M
neostore.propertystore.db.arrays.mapped_memory=0M
neo4j-wrapper.conf changes:
wrapper.java.initmemory=5000
wrapper.java.maxmemory=5000
However my load is still taking a very long time, and I am considering using the recently released Import Tool (http://neo4j.com/docs/milestone/import-tool.html). Before I switch to it, I was wondering whether I could be doing anything else to improve the speed of my imports.
I begin by creating several constraints to make sure that the IDs I'm using are unique:
CREATE CONSTRAINT ON (Country) ASSERT c.Name IS UNIQUE;
//and constraints for other name identifiers as well..
I then use periodic commit...
USING PERIODIC COMMIT 10000
I then LOAD in the CSV where I ignore several fields
LOAD CSV WITH HEADERS FROM "file:/path/to/file/MyFile.csv" as line
WITH line
WHERE line.CountryName IS NOT NULL AND line.CityName IS NOT NULL AND line.NeighborhoodName IS NOT NULL
I then create the necessary nodes from my data.
WITH line
MERGE(country:Country {name : line.CountryName})
MERGE(city:City {name : line.CityName})
MERGE(neighborhood:Neighborhood {
name : line.NeighborhoodName,
size : toInt(line.NeighborhoodSize),
nickname : coalesce(line.NeighborhoodNN, ""),
... 50 other features
})
MERGE (city)-[:IN]->(Country)
CREATE (neighborhood)-[:IN]->(city)
//Note that each neighborhood only appears once
Does it make sense to use CREATE UNIQUE rather than applying MERGE to any COUNTRY reference? Would this speed it up?
A ~250,000-line CSV file took over 12 hours to complete, and seemed excessively slow. What else can I be doing to speed this up? Or does it just make sense to use the annoying-looking Import Tool?