1
votes

I have scenarios like below:

CREATE (p:Person{guid:1})
CREATE (b1:Book{guid:1})
CREATE (b2:Book{guid:2})
CREATE (b3:Book{guid:3})

MATCH (p:Person{guid:1}),(b1:Book{guid:1}) CREATE (p)-[:READ]->(b1)
MATCH (p:Person{guid:1}),(b2:Book{guid:2}) CREATE (p)-[:READ]->(b2)
MATCH (p:Person{guid:1}),(b3:Book{guid:3}) CREATE (p)-[:READ]->(b3)

Currently, above cypher queries are run sequentially. I need to improve the performance my of write operations.

I think creation of p,b1,b2,b3 can happen in parallel. Once this is done, connection between p and b1,b2,and b3 can happen in parallel. Also, I think above queries can go in a single batch instead of separate write operations.

I am using neo4jphp and node-neo4j.

I think we have Transactional Cypher HTTP endpoint and Batch operations. Whether these improve the write performance? Which of this is better for above case?

Looks like neo4jphp supports batch and cypher transactions. But not sure whether it is possible to achieve batch/cypher transactions in node-neo4j.

1

1 Answers

1
votes

You should use paramterized Cypher in order to remove the overhead of parsing the statement and building up the query plan.

In your case the statement could be changed to:

MERGE (p:Person{guid:{personGuid}})
MERGE (b:Book{guid:{bookGuid}})
CREATE (p)-[:READ]->(b)

and supply as parameters:

{ "personGuid": 1, "bookGuid": 1 }
{ "personGuid": 1, "bookGuid": 2 }
{ "personGuid": 1, "bookGuid": 3 }

Be sure to have indexes:

CREATE INDEX ON :Person(guid)
CREATE INDEX ON :Book(guid)

Using the transactional endpoint try to aggregate ~ 10k-50k basic operations into one transaction to have a good balance between memory consumption and transactional overhead.