1
votes

I have a Spring Data Neo4j application that needs to do bulk data write/read to Neo4j Community Edition (3.2).

My system configuration (Macbook pro) 16GB RAM, 2.5 GHz Intel Core i7.

Total nodes : 120,000. (5 properties in each node.)

I have 500 relationships per node.

Above nodes/relationships is part of initial data I need for other parts of application to work.

I use Spring Data Neo4j for read/write transactions. Each node builds its corresponding 500 relationships sequentially. So obviously it takes a significant amount of time to build all above nodes and relationships.

Sample Code:

Entity:

//Neo4j entity class
import org.neo4j.ogm.annotation.GraphId;
import org.neo4j.ogm.annotation.NodeEntity;
import org.neo4j.ogm.annotation.Relationship;

@NodeEntity
public class SamplePojo {

@GraphId
    public Long id;
    private String property1;
    private String property2;
    private Integer property3;
    private Double property4;
    private Integer property5;

@Relationship(type="has_sample_relationship",direction="OUTGOING")
    List<SamplePojo> sampleList = new ArrayList<>();

//Getters and setters...

}

Repository:

import org.springframework.data.neo4j.annotation.Query;
import org.springframework.data.neo4j.repository.GraphRepository;

@Repository
public interface SamplePojoRepository extends GraphRepository<SamplePojo> {

//save

}

Service class:

  @Service
    public class DataInsertion{

    @Autowired
    SamplePojoRepository repository;


    public writeToNeo4j(List<SamplePojo> pojoList){

    for(SamplePojo p : pojoList){

    // Loop through more than 100,000 objects that have properties set and relationships as well

    repository.save();    // save to neo4j db

    }  
}
     }

My Observation:

Initially, first few minutes , it took 1200 write operations/minute.

After few minutes , write operations came down significantly from 1200 to 100 write operations/minute .

Later, it came down to 10 write operations/minute.

Does anyone know root cause of the problem of why Neo4j write operations slow down by time ?

Please let me know if additional information is needed, will update the question. Thanks in advance!

1
I have read : stackoverflow.com/questions/19589687/… but none of the answers explain exactly why it happens.Tanmay Delhikar

1 Answers

5
votes

This is very broad question, you should at least profile your application to identify what part slows down - is it Neo4j itself? Particular query? Spring Data Neo4j? Your application? Then it will be easier to help you.

The usual suspects are:

  • your transaction is too large - split load into smaller transactions of 1k to 50k elements (nodes + relationships + properties) - this is needed because Neo4j holds transaction state in memory and it might spent to much time in GC (or even run out of memory) when you have large transactions.

  • growing OGM session - again causing to much time spent in GC - clear the Session from time to time (this should be done automatically with SDN when @Transactional method is finished)

  • there is some operation without an index that becomes slow with growing amount of data (e.g. doing full node label scan instead of using index)

  • low memory for Neo4j or your application - time is spent mostly in GC

  • there might be a performance issue with SDN/OGM - a reproducible test case would be great for this.