1
votes

I want to create 100k nodes in neo4j. What approach is most suited for this purpose? Cypher or csv upload or anything else?

I just tried creating some nodes and relationships using java by running parameterized cypher:

for(int i = 0; i < 100000; i++)
{
    params.put("param1", "param1_val_" + i);
    //...
    params.put("param10", "param10_val_" + i); 
    neo4jsession.run(cypher, params);
}

It took 15 minutes to run 20 000 iterations of above loop. Each cypher execution creates set containing three nodes and two relationships.

Earlier I tried to run non-parameterized cypher. But it was further slower than this (as it seems that neo4j recreates cypher query plan for every query). Are there any better ways to optimize cypher ran through neo4j java api? Or should we be using some other approach like csv upload for such bulk node and relationship creation? Is any further performance gain achievable by any approach?

1

1 Answers

1
votes

Batching with a single insert per transaction won't perform well. You'll want to batch multiple at a time, passing a collection of inputs, UNWINDing within the cypher query and performing your writes per batch.

You can do this via LOAD CSV with USING PERIODIC COMMIT, or you can do it yourself with these batching techniques.

APOC Procedures also has batching procedures that are useful when batching changes to your graph when you're not importing data from the outside.