1
votes

The data comes to the system continuosly with rate 300-500 TPS. I need to import it to neo4j with the following scheme:

  1. If N node does not exist, create it
  2. If the relation N-[rel:rel_type]->X does not exist, create it
  3. Increment rel.weight

It seems to be impossible to solve the problem using REST batch. Different cypher queries are too long because they generate many small transactions.

Gremlin works much faster. I collect parameters for gremlin script in array and execute it as a batch. But even though I could hardly reach the speed of 300 TPS.

I should mention that besides there will be a flow of queries ~500 TPS:

START N=node(...) MATCH N-[rel:rel_type]->X return rel.weight,X.name;

The heap size is set to 5 Gb. Additional options:

-XX:MaxPermSize=1G -XX:+CMSClassUnloadingEnabled -XX:+UseParallelGC -XX:+UseNUMA

What is optimal way and configuration for importing such kind of data?

1
Have you tried using the REST batch insertion without using cypher or gremlin ? The batch is running on a single transaction, and I got pretty decent performance (especially if the process that is doing the insertion is on the same box as the db) - RaduK

1 Answers

3
votes

to check whether or not the incoming node exists and has the rels to other node, you can use create unique syntax.

START n=node:node_index(newNode={N})
CREATE UNIQUE n-[:REL_TYPE]->x ;

to automatically increment the weight of the relationship, i would assume something like this (but no warranty on this, there is probably a faster way of doing it):

START n=node:node_index(newNode={N})
CREATE UNIQUE n-[rel:REL_TYPE]->x
SET rel.weight = coalesce(rel.weight?,0) +1