0
votes

My neo4j graph is pretty simple : it consists of Users and "Follows" relaionships between them. There is index for User label on "login" property. Here is the fragment of the graph:

{
            "nodes": [
              {
                "id": "3216",
                "labels": [
                  "User"
                ],
                "properties": {
                  "login": "user#111",
                  "status": 16
                }
              },
              {
                "id": "3218",
                "labels": [
                  "User"
                ],
                "properties": {
                  "login": "user#1983",
                  "status": 1
                }
              }
            ],
            "relationships": [
              {
                "id": "4188",
                "type": "Follows",
                "startNode": "3216",
                "endNode": "3218",
                "properties": {}
              }
            ]
          }
}

status field of User node indicates whether it was handled - all the relationships are created. Then I have the node.js app that is performing the following steps:

  1. Selects the next node that has status = 1(unhandled)
  2. Obtains the login of the node that was selected on step 1.
  3. Requests the web service for the followers of the user with login selected on step 2.
  4. Adds the new User nodes and "Follows" relationships by using cypher statements that take care of uniqueness of nodes and relaionships, REST API and cypher transactions end-point, here is the sample of the query:

function buildQuery(login, followers){
  return {
    statement : 'MATCH (me:User {login : {login} }) FOREACH (f IN {followers} | MERGE (u:User { login : f }) MERGE u-[:Follows]->me),  
    parameters : {
      login : login,
      followers : followers
    }
  }
}

At the moment the DB has 350K User nodes and and 1.9M relationships and adding the new ones is TERRIBLY slow. It takes about ~8 seconds to add the single follower and the corresponding relationship on a quite powerful machine(8 core CPU, 14 GB RAM) which does not do anything else - it's Ubuntu server hosted on Azure specifically for Neo4 DB.

I am wondering is there anything I can do in order to improve the performance of addigng the new nodes? Tuning the query? Tuning the Neo4j configuration? Using Core Java API? Something else? Thanks!

2
Can you run your statement with some sample data but prefixed with PROFILE in the Neo4j browser and share the PROFILE output?Michael Hunger
Yes, I run the following query: PROFILE MATCH (me:User {login : "hbo"} ) UNWIND ["hbo2", "hbo3", "hbo4", "hbo21", "hbo24", "hbo20", "hbo12", "hbo32", "hbo52", "hbo92", "hbo27", "hbo29", "hbo22", "hbo42"] as f MERGE (u:User { login : f }) MERGE (u)-[:Follows]->(me) and here is the result saved as Json : gist.github.com/sAbakumoff/60d16e09ce3448b53bd8sovo2014

2 Answers

1
votes

The statement should take a few ms at most.

Do you have a constraint on :User(login) ?

create constraint on (u:User) assert u.login is unique

Do you use 2.3.1 ?

Please both and report back.

If you can't upgrade to 2.3.1 for any reason let me know.

0
votes

Could you try to change your statement to this:

MATCH (me:User {login : {login} }) 
UNWIND {followers} as f  
MERGE (u:User { login : f })
MERGE (u)-[:Follows]->(me)

Sometimes MERGE within FOREACH didn't use the unique index.