0
votes

We are trying to load Millions of nodes and relationships into Neo4j. We are currently using below command

USING PERIODIC COMMIT LOAD CSV WITH HEADERS FROM "file:customers.csv" AS row CREATE (:Customer ....

But it is taking us lot of time.

I do see a link which explains modifying the neo4j Files directly. http://blog.xebia.com/combining-neo4j-and-hadoop-part-ii/

But above link seems to be very old. wanted to know if above process is still valid ?

There is a issue in "neo4j-spark-connector" Github link. which is not updated fully.

https://github.com/neo4j-contrib/neo4j-spark-connector/issues/15

What is the best way among those ?

3

3 Answers

2
votes

The fastest way, especially for large data sets, should be through the import tool instead of via Cypher with LOAD CSV.

1
votes

If you are using LOAD CSV, potentially with MERGE, I highly recommend adding unique constraints - for us it sped up a smallish import (100k nodes) by 100 times or so

0
votes

You can make use of apoc methods which can perform better for large datasets. Below is a sample cypher query

CALL apoc.periodic.iterate(
    'CALL apoc.load.csv(file_path) YIELD lineNo, map as row, list return row',
    'MATCH (post:Post {id:row.`:END_ID(Post)`})
     MATCH (owner:User {id:row.`:START_ID(User)`})
     MERGE (owner)-[:ASKED]->(post);', 
    {batchSize:500, iterateList:true, parallel:true}
);

Below is the documentation link : https://neo4j-contrib.github.io/neo4j-apoc-procedures/#_examples_for_apoc_load_csv