Speedup relationship and node creation using cypher in Neo4j

Question

i have 2 csv files A and B. File A contains 7000 rows with 6 properties and File B contains 10M rows with 11 properties. Moreover, File A has the property PKA which is used as primary key, whereas File B has the property FKA which is used as foreign key respect to PKA.

I want to load these files into Neo4j in this way: 1 - insert a new node for each row of File A and File B 2 - add a relationship between any node created that represents the relationship primary and foreign key described.

Currently, I have inserted these files with BatchInserter using the JAVA API adding a node for each row of these files and setting the labels "A" and "B" for File A and file B respectively. I have also create two index for PKA and FKA. To add the relationships my intention is to call the following cypher statement (from Neo4jShell):

match (a:A), (b:B) where a.PKA=b.FKB create (a)<-[:KEYREL]-(b);

My problems are: - adding the nodes with BatchInserter takes 14minutes for File B (the biggest one) with only one commit at the end (~12k nodes/sec, ~130k properties/sec), I want to speedup the import process of a factor of 2. - the cypher query can't be handled with this dataset size but i would like to make is possible.

Im running on a VM with an IntelXeon @2.6Ghz dual core and 8GB RAM with Windows 64bit and Java8 64 bit installed. I have run my import java program and Neo4jShell with the following java options:

-server -XX:+UseConcMarkSweepGC -Xms2000m -Xmx5000m

don't forget to remove the FKB after you created the relationships — Michael Hunger

songololo songololo · Accepted Answer · 2014-06-12T10:36:10

Running MATCH is typically quite slow when employed on a large volume of data.

You could try to speed it up creating a constraint on the nodes, wherein you define each node as unique. This can speed up the MATCH operation, though it does also take time to create the constraint:

CREATE CONSTRAINT ON (a:A) ASSERT a.PKA IS UNIQUE;
CREATE INDEX ON :B(PKB);

You can then run the MATCH, which you can run from a third CSV file per the Neo4j docs which describe a similar scenario to yours.

Speedup relationship and node creation using cypher in Neo4j

1 Answers