0
votes

Neo4j writes slow down after a few thousand records are written.

Two indexes exist to speed query up. Also, using EXPLAIN I know that each query is a constant time process.

        indexOrder = schema.indexFor(ORDER)
                .on("id")
                .create();
        indexShop = schema.indexFor(SHOP)
                .on("domain")

This is the query I use:

 WITH {json} as log 
 WITH log.order as order, log.shop as shop 
 MERGE (s:Shop {domain:shop.domain})
    ON CREATE SET s=shop
 MERGE (s)-[:Scored]->(r:Order {id:order.id})
    ON CREATE SET r=order

Here is how I commit store it to DB:

private void log() {
    try (Transaction tx = graphDb.beginTx()) {
        for (Map map : list) {
            graphDb.execute(query,
                    singletonMap("json", map));

        }
        list = new ArrayList<>();
        tx.success();
    }
}

And I call the above when I have 1k logs.

        list.add(map);
        count++;
        if (count % 1000 == 0) {
            log();
            System.out.println(count);
        }

Additional info: I use these config settings:

            .setConfig(GraphDatabaseSettings.pagecache_memory, "512M")
            .setConfig(GraphDatabaseSettings.string_block_size, "60")
            .setConfig(GraphDatabaseSettings.array_block_size, "300")

This system works to 200k entries if its all done within one transaction but then runs into memory issues.

So, why does the 1k entries/transaction approach grind to a halt after just 5 transactions (5k entries) are commited to the database?

How do I fix the problem?

1

1 Answers

2
votes
  1. You should do a MERGE on the Order node by itself, to allow the Cypher planner to use the :Order(id) index. (Besides, you should MERGE the nodes in a path pattern before doing a MERGE on the path pattern anyway, to avoid creating duplicate nodes in certain circumstances.) With this change, your query will look like this (but it is not yet ideal):

    WITH {json} as log 
    WITH log.order as order, log.shop as shop 
    MERGE (s:Shop {domain:shop.domain})
      ON CREATE SET s=shop
    MERGE (r:Order {id:order.id})
      ON CREATE SET r=order
    MERGE (s)-[:Scored]->(r)
    
  2. You should minimize the number of calls to execute(), as each call has a lot of overhead. In fact, you can easily make a single query to handle an entire list of 1000 items. So, you can change your log() code to the following (I assume that list is defined to be List<Map<String, Object>> or something like that):

    private void log() {
        try (Transaction tx = graphDb.beginTx()) {
            graphDb.execute(query, singletonMap("logList", list));
            list = new ArrayList<Map<String, Object>>();
            tx.success();
        }
    }
    

    and here is the corresponding Cypher query:

    UNWIND {logList} as log 
    WITH log.order as order, log.shop as shop 
    MERGE (s:Shop {domain:shop.domain})
      ON CREATE SET s=shop
    MERGE (r:Order {id:order.id})
      ON CREATE SET r=order
    MERGE (s)-[:Scored]->(r)