My neo4j graph is pretty simple : it consists of Users and "Follows" relaionships between them. There is index for User label on "login" property. Here is the fragment of the graph:
{
"nodes": [
{
"id": "3216",
"labels": [
"User"
],
"properties": {
"login": "user#111",
"status": 16
}
},
{
"id": "3218",
"labels": [
"User"
],
"properties": {
"login": "user#1983",
"status": 1
}
}
],
"relationships": [
{
"id": "4188",
"type": "Follows",
"startNode": "3216",
"endNode": "3218",
"properties": {}
}
]
}
}
status field of User node indicates whether it was handled - all the relationships are created. Then I have the node.js app that is performing the following steps:
- Selects the next node that has status = 1(unhandled)
- Obtains the login of the node that was selected on step 1.
- Requests the web service for the followers of the user with login selected on step 2.
- Adds the new User nodes and "Follows" relationships by using cypher statements that take care of uniqueness of nodes and relaionships, REST API and cypher transactions end-point, here is the sample of the query:
function buildQuery(login, followers){
return {
statement : 'MATCH (me:User {login : {login} }) FOREACH (f IN {followers} | MERGE (u:User { login : f }) MERGE u-[:Follows]->me),
parameters : {
login : login,
followers : followers
}
}
}
At the moment the DB has 350K User nodes and and 1.9M relationships and adding the new ones is TERRIBLY slow. It takes about ~8 seconds to add the single follower and the corresponding relationship on a quite powerful machine(8 core CPU, 14 GB RAM) which does not do anything else - it's Ubuntu server hosted on Azure specifically for Neo4 DB.
I am wondering is there anything I can do in order to improve the performance of addigng the new nodes? Tuning the query? Tuning the Neo4j configuration? Using Core Java API? Something else? Thanks!