Neo4j : Difference between cypher execution and Java API call?

Question

Neo4j : Enterprise version 3.2

I see a tremendous difference between the following two calls in terms for speed. Here are the settings and query/API.

Page Cache : 16g | Heap : 16g

Number of row/nodes -> 600K

cypher code (ignore syntax if any) | Time Taken : 50 sec.

using periodic commit 10000
load with headers from 'file:///xyx.csv' as row with row
create(n:ObjectTension) set n = row

From Java (session pool, with 15 session at time as an example):

Thread_1 : Time Taken : 8 sec / 10K

   Map<String,Object> pList = new  HashMap<String, Object>();

   try(Transaction tx = Driver.session().beginTransaction()){
      for(int i = 0; i< 10000; i++){
         pList.put(i, i * i);
         params.put("props",pList);
         String query = "Create(n:Label {props})";
         // String query = "create(n:Label) set n = {props})";
          tx.run(query, params);
   }

Thread_2 : Time taken is 9 sec / 10K

   Map<String,Object> pList = new  HashMap<String, Object>();
   try(Transaction tx = Driver.session().beginTransaction()){
      for(int i = 0; i< 10000; i++){
         pList.put(i, i * i);
         params.put("props",pList);
         String query = "Create(n:Label {props})";
         // String query = "create(n:Label) set n = {props})";
          tx.run(query, params);
   }
.
.
.
Thread_3 : Basically the above code is reused..It's just an example.

Thread_N where N = (600K / 10K)

Hence, the over all time taken is around 2 ~ 3 mins.

The question are the following?

How does CSV load handles internally? Like does it open single session and multiple transactions within?

Or

Create multiple session based on the parameter passed as "Using periodic commit 10000", with this 600K/10000 is 60 session? etc

What's the best way to write via Java?

The idea is achieve the same write performance as CSV load via Java. As the csv load 12000 nodes in ~5 seconds or even better.

Assume that the pList if out the try block and the data is filled loaded before the transaction is opened. Map<String,Object> pList = new HashMap<String, Object>(); for(int i = 0; i< 10000; i++){ pList.put(i, i * i); params.put("props",pList); } try(Transaction tx = Driver.session().beginTransaction()){ String query = "Create(n:Label {props})"; // String query = "create(n:Label) set n = {props})"; tx.run(query, params); } — neoman1

cybersam cybersam · Accepted Answer · 2017-10-11T19:14:47

Your Java code is doing something very different than your Cypher code, so it really makes no sense to compare processing times.

You should change your Java code to read from the same CSV file. File IO is fairly expensive, but your Java code is not doing any.

Also, whereas your pure Cypher query is creating nodes with a fixed (and presumably relatively small) number of properties, your Java pList is growing in size with every loop iteration -- so that each Java loop creates nodes with between 1 to 10K properties! This may be the main reason why your Java code is much slower.

[UPDATE 1]

If you want to ignore the performance difference between using and not using a CSV file, the following (untested) code should give you an idea of what similar logic would look like in Java. In this example, the i loop assumes that your CSV file has 10 columns (you should adjust the loop to use the correct column count). Also, this example gives all the nodes the same properties, which is OK as long as you have not created a contrary uniqueness constraint.

Session session = Driver.session();

Map<String,Object> pList = new HashMap<String, Object>();
for (int i = 0; i < 10; i++) {
    pList.put(i, i * i);
}

Map<String, Map> params = new HashMap<String, Map>();
params.put("props", pList);

String query = "create(n:Label) set n = {props})";

for (int j = 0; j < 60; j++) {
    try (Transaction tx = session.beginTransaction()) {
        for(int k = 0; k < 10000; k++){
            tx.run(query, params);
        }
    }
}

[UPDATE 2 and 3, copied from chat and then fixed]

Since the Cypher planner is able to optimize, the actual internal logic is probably a lot more efficient than the Java code I provided (above). If you want to also optimize your Java code (which may be closer to the code that Cypher actually generates), try the following (untested) code. It sends 10000 rows of data in a single run() call, and uses the UNWIND clause to break it up into individual rows on the server.

Session session = Driver.session();

Map<String, Integer> pList = new HashMap<String, Integer>();
for (int i = 0; i < 10; i++) {
  pList.put(Integer.toString(i), i*i);
}

List<Map<String,Integer>> rows = Collections.nCopies(1, pList);

Map<String, List> params = new HashMap<String, List>();
params.put("rows", rows);

String query = "UNWIND {rows} AS row CREATE(n:Label) SET n = {row})";

for (int j = 0; j < 60; j++) {
  try (Transaction tx = session.beginTransaction()) {
    tx.run(query, params);
  }
}

Neo4j : Difference between cypher execution and Java API call?

2 Answers