Neo4j : Enterprise version 3.2
I see a tremendous difference between the following two calls in terms for speed. Here are the settings and query/API.
Page Cache : 16g | Heap : 16g
Number of row/nodes -> 600K
cypher code (ignore syntax if any) | Time Taken : 50 sec.
using periodic commit 10000
load with headers from 'file:///xyx.csv' as row with row
create(n:ObjectTension) set n = row
From Java (session pool, with 15 session at time as an example):
Thread_1 : Time Taken : 8 sec / 10K
Map<String,Object> pList = new HashMap<String, Object>();
try(Transaction tx = Driver.session().beginTransaction()){
for(int i = 0; i< 10000; i++){
pList.put(i, i * i);
params.put("props",pList);
String query = "Create(n:Label {props})";
// String query = "create(n:Label) set n = {props})";
tx.run(query, params);
}
Thread_2 : Time taken is 9 sec / 10K
Map<String,Object> pList = new HashMap<String, Object>();
try(Transaction tx = Driver.session().beginTransaction()){
for(int i = 0; i< 10000; i++){
pList.put(i, i * i);
params.put("props",pList);
String query = "Create(n:Label {props})";
// String query = "create(n:Label) set n = {props})";
tx.run(query, params);
}
.
.
.
Thread_3 : Basically the above code is reused..It's just an example.
Thread_N where N = (600K / 10K)
Hence, the over all time taken is around 2 ~ 3 mins.
The question are the following?
- How does CSV load handles internally? Like does it open single session and multiple transactions within?
Or
Create multiple session based on the parameter passed as "Using periodic commit 10000", with this 600K/10000 is 60 session? etc
- What's the best way to write via Java?
The idea is achieve the same write performance as CSV load via Java. As the csv load 12000 nodes in ~5 seconds or even better.