0
votes

I am creating nodes in Neo4j using neo4j-java driver with the help of following Cipher Query.

String cipherQuery = "CREATE (n:MLObsTemp { personId: " + personId + ",conceptId: " + conceptId
                            + ",obsId: " + obsId + ",MLObsId: " + mlObsId + ",encounterId: " + encounterId + "}) RETURN n";

Function for creating query

createNeo4JObsNode(String cipherQuery);

Implementation of the Function

private void createNeo4JObsNode(String cipherQuery) throws Exception {
   try (ConNeo4j greeter = new ConNeo4j("bolt://localhost:7687", "neo4j", "qwas")) {
   System.out.println("Executing query : " + cipherQuery);

  try (Session session = driver.session()) {
   StatementResult result = session.run(cipherQuery);

  } catch (Exception e) {
   System.out.println("Error" + e.getMessage());
  }

 } catch (Exception e) {
  e.printStackTrace();
 }

}

Making relation for the above nodes using below code

String obsMatchQuery = "MATCH (m:MLObsTemp),(o:Obs) WHERE m.obsId=o.obsId CREATE (m)-[:OBS]->(o)";
        createNeo4JObsNode(obsMatchQuery);
        
        String personMatchQuery = "MATCH (m:MLObsTemp),(p:Person) WHERE m.personId=p.personId CREATE (m)-[:PERSON]->(p)";
        createNeo4JObsNode(personMatchQuery);
        
        String encounterMatchQuery = "MATCH (m:MLObsTemp),(e:Encounter) WHERE m.encounterId=e.encounterId CREATE (m)-[:ENCOUNTER]->(e)";
        createNeo4JObsNode(encounterMatchQuery);
        
        String conceptMatchQuery = "MATCH (m:MLObsTemp),(c:Concept) WHERE m.conceptId=c.conceptId CREATE (m)-[:CONCEPT]->(c)";
        createNeo4JObsNode(conceptMatchQuery);

It is taking me 13 seconds on average for creating nodes and 12 seconds for making relations. I have 350k records in my database for which I have to create nodes and their respective relations.

How can I improve my code? Moreover, is this the best way for creating nodes in Neo4j using bolt server and neo4j-java driver?

EDIT

I am now using the query parameter in my code

 HashMap<String, Object> parameters = new HashMap<String, Object>();
        
         ((HashMap<String, Object>) parameters).put("personId", 1390);
         ((HashMap<String, Object>) parameters).put("obsId", 14001);
         ((HashMap<String, Object>) parameters).put("conceptId", 5978);
         ((HashMap<String, Object>) parameters).put("encounterId", 10810);
         ((HashMap<String, Object>) parameters).put("mlobsId", 2);
         
         
         
         String cypherQuery=
                 "CREATE (m:MLObsTemp { personId: $personId, ObsId: $obsId, conceptId: $conceptId, MLObsId: $mlobsId, encounterId: $encounterId}) "
                + "WITH m MATCH (p:Person { personId: $personId }) CREATE (m)-[:PERSON]->(p) "
                + "WITH m MATCH (e:Encounter {encounterId: $encounterId }) CREATE (m)-[:Encounter]->(e) "
                + "WITH m MATCH (o:Obs {obsId: $obsId }) CREATE (m)-[:OBS]->(o) "
                + "WITH m MATCH (c:Concept {conceptId: $conceptId }) CREATE (m)-[:CONCEPT]->(c) "
                + " RETURN m";

Creating Node function

 try {
  ConNeo4j greeter = new ConNeo4j("bolt://localhost:7687", "neo4j", "qwas");

  try {
   Session session = driver.session();
   StatementResult result = session.run(cypherQuery, parameters);
   System.out.println(result);
  } catch (Exception e) {
   System.out.println("[WARNING] Null Row");
  }

 } catch (Exception e) {
  e.printStackTrace();
 }

I am also performing the indexing in order to speed up the process

  CREATE CONSTRAINT ON (P:Person) ASSERT P.personId IS UNIQUE
        CREATE CONSTRAINT ON (E:Encounter) ASSERT E.encounterId IS UNIQUE
        CREATE CONSTRAINT ON (O:Obs) ASSERT O.obsId IS UNIQUE
        CREATE CONSTRAINT ON (C:Concept) ASSERT C.conceptId IS UNIQUE

Here is the plan for 1 cypher query-profile

Now the performance has improved but not significant. I am using neo4j-java-driver version 1.6.1. How can I batch my cipher queries to improve the performance further.

1
Do you have indexes created on the relevant id properties? Have you profiled your queries and inspected your query plans? Also, consider parameterizing your queries rather than using string appending.InverseFalcon
How can I inspect my query plan and parameterised my queries. Could you elaborate it further provide some links.?Zakir saifi
Here's the link for query profiling, and here's the one for parameterizing your queries.InverseFalcon
You should use parameters and indexes/constraints and also batch your updates, see: medium.com/neo4j/…Michael Hunger
The query language is named Cypher, not Cipher, please use the proper name.Michael Hunger

1 Answers

2
votes

You should try to minimize redundant work in your cyphers.

MLObsTemp has a lot of redundant properties, and you are searching for it to create every link. Relationships defeat the need to create properties for foreign keys (node ids)

I would recommend a Cypher that does everything together, and uses parameters like this...

CREATE (m:MLObsTemp) 
WITH m MATCH (p:Person {id:"$person_id"}) CREATE (m)-[:PERSON]->(p)
WITH m MATCH (e:Encounter {id:"$encounter_id"}) CREATE (m)-[:Encounter]->(e)
WITH m MATCH (c:Concept {id:"$concept_id"}) CREATE (m)-[:CONCEPT]->(c)
// SNIP more MATCH/CREATE
RETURN m

This way, Neo4j doesn't have to find m repeatedly for every relationship. You don't need the ID properties, because that is effectively what the relationship you just created is. Neo4j is very efficient at walking edges (relationships), so just follow the relationship if you need the id value.

TIPS: (mileage may very across Neo4j versions)

  • Inline is almost always more efficent than WHERE (MATCH (n{id:"rawr"}) vs MATCH (n) WHERE n.id="rawr")
  • Parameters make frequent, similar queries more efficient, as Neo4j will cache how to do it quickly (the $thing_id syntax used in the above query.) Also, It protects you from Cypher injection (See SQL injection)
  • From a Session, you can create a Transaction (Session.run() actually creates a transaction for each run call). You can batch multiple Cyphers using a single transaction (Even using the results of previous Cyphers from the same transaction), because transactions live in memory until you mark it a success and close it. Note that if you are not careful, your transaction can fail with "outofmemory". So remember to commit periodically/between batches. (commit batches of 10k records seems to be the norm when ingesting large data sets)