I'm trying to benchmark Neo4j massive insertion in client-server environment. So far I've found that there are only two ways to do it:
- use REST
- implement server extension
I can say upfront that our design requires to be able to insert from many concurrently running processes/machines, so using batch insert with direct connection is not an option.
I would also like to avoid having to implement server extension as we already have tight schedule.
I benchmarked massive insertion via REST from just a single client, sending 2 kinds of very simple Cypher queries:
create (vertex:V {guid: {guid}, vtype: {vtype}, random1: {random1}, random2: {random2} })
match (a:V {guid: {a} }) match (b:V {guid: {b} }) create (a)-[:label]->(b)
Guid field had an index.
Results so far are very poor around (10k V + 40k E) in 13 minutes, compared to competing products like Titan or Orient, which provide efficient server out of the box and throughput at around (10k V + 40k E) per 1 minute.
I tried longer lasting transactions, and query parameters, none give any significant gains. Furthermore, the overhead from REST is very small as I tested dummy queries and they execute much much faster (and both client and server are on the same machine). I also tried inserting from multiple threads - performance does not scale up.
I found another StackOverflow question, where advise was to batch inserts into large requests containing thousands of commands and periodically commit. Unfortunatelly, due to the nature of how we generate the data, batching the requests is not feasible. Ideally we'd like the inserts to be atomic operations and have the results appear as soon as they are executed (no need for transactions in fact).
Thus my questions are:
- are my Cypher queries optimal for the insertion?
- are the results so far in line with what can be achieved with REST (or can I squeeze much more from REST) ?
- are there any other ways to perform efficient multi-client massive insertion?
{guid}
the same as either{a}
or{b}
? Also, have you already created an index (or uniqueness constraint) on:V(guid)
? – cybersam