9
votes

I am evaluating the performance of Neo4j graph database with a simple benchmark for insert, update, delete and query. Using Neo4j OGM I see significantly slower execution times (about 2-4 times) compared to the direct access via Neo4j driver. For example, delete operation (see code below) is done in 500ms vs 1200ms for 10K nodes and 11K relations on my machine. I wonder why this happens, especially because the below code for deletion doesn't even use any node entity. I can imagine that OGM has some overhead but this seems to be too much. Anyone has an idea why it's slower?

Example node:

public abstract class AbstractBaseNode {

    @GraphId
    @Index(unique = true)
    private Long id;

    public Long getId() {
        return id;
    }
}

@NodeEntity
public class Company extends AbstractBaseNode {

    private String name;

    public Company(String name) {
        this.name = name;
    }

    public String getName() {
        return name;
    }

    public void setName(String name) {
        this.name = name;
    }
}

Example code for delete via driver:

driver = GraphDatabase.driver( "bolt://localhost:7687", AuthTokens.basic( "neo4j", "secret" ) );
session = driver.session();

long start = System.nanoTime();

session.run("MATCH (n) DETACH DELETE n").list();

System.out.println("Deleted all nodes " + ((System.nanoTime() - start) / 1000) + "μs");

Example code for delete via OGM:

public org.neo4j.ogm.config.Configuration neo4jConfiguration() {
    org.neo4j.ogm.config.Configuration config =  new org.neo4j.ogm.config.Configuration();
    config.autoIndexConfiguration().setAutoIndex(AutoIndexMode.DUMP.getName());
    config.driverConfiguration()
            .setDriverClassName("org.neo4j.ogm.drivers.bolt.driver.BoltDriver")
            .setURI("bolt://neo4j:secret@localhost")
            .setConnectionPoolSize(10);

    return config;
}

sessionFactory = new SessionFactory(neo4jConfiguration(), "net.mypackage");
session = sessionFactory.openSession();

long start = System.nanoTime();

session.query("MATCH (n) DETACH DELETE n", Collections.emptyMap()).forEach(x -> {});

System.out.println("Deleted all nodes " + ((System.nanoTime() - start) / 1000) + "μs");
1
For this particular query it should really be doing the same thing. I tried to reproduce this, but I don't see much difference between the two. What versions of ogm and neo4j-java-driver have you used? Do you have a proper benchmark that would replicate this that you could share?František Hartman
Thanks so far. I will try to minimize the example and upload it.Steffen Harbich
Under dropbox.com/s/uf6oqrn9to0ax1j/neo4j%20min.zip?dl=0 I uploaded a gradle project. There are two test, one for driver and one for OGM access to neo4j. You can execute both test classes several times to get average measurements. As a requirement Neo4j community needs to run under default settings. I couldn't reproduce the huge difference for delete operation but for the creation of the nodes.Steffen Harbich
And I found out that, with my initial benchmark, neo4j java driver version 1.1.0 was a lot of faster than 1.3.0 but I cannot reproduce it with my uploaded example.Steffen Harbich
@SteffenHarbich, I don't think your example might be counted as a MCVE. I don't see a major difference between delete using different drivers, and as for create the difference is obvious: in one case you directly create queries and in other case you make OGM driver to analyze objects graph (yes, of one object but the OGM driver can't know it beforehand) and create queries for you. If you change OGM test to use session.query to create records as well, results seems to be almost indistinguishable.SergGr

1 Answers

2
votes

I will start by pointing out your test samples are poor. When taking time sample, you want to stress the system so that it takes a fair amount of time. The tests should also test what your interested in (are you testing how fast you can create and drop connections? Max Cypher through put? Speed of single large transaction?) With tests that are barley a second, it is impossible to tell if difference in performance is the query call, or just startup overhead (despite the name, the session doesn't actually connect until you call query(...)).

As far as I can tell, both version perform about the same in a normal setup. The only thing I can think of that would affect this is if the OSGM was doing something to starve other processes of system resources.

UPDATE

UNWIND {rows} as row 
CREATE (n:Company) 
SET n=row.props 
RETURN row.nodeRef as ref, ID(n) as id, row.type as type with params {rows=[{nodeRef=-1206180304, type=node, props={name=company_1029}}]}

VS

CREATE (a:Company {name: {name}}) // X10,000

The key difference between the driver and the OGM is that the driver does exactly what you tell it to do, which is the most efficient way of doing things; and the OGM tries to manage the query logic for you (What to return, how to save things, what to try to save). And the OGM version is more reliable because it will automatically try to consolidate nodes to the database (if possible), and will only save things that have actually changed. Since your node class doesn't have a primary key to consolidate on, it will have to create everything. The OGM Cypher is more versatile, but it also requires more memory use/access. SET n.name="rawr" is 1 db hit per property. SET n={name:"rawr"} is 3 db hits though (about 1+2*#_of_props. {name:"rawr", id:2} is 5 db hits). That is why the OGM Cypher is slower. The OGM however has smart management though, so if you than edit one node and try to save it, the driver would have to either save all, or you would have to implement your own manager. The OGM will only save the updated one.

So in short, the OGM Cyphers are less efficient than what you would write using the driver, but the OGM has smart management built in that can make it faster than a blind driver implementation in real business logic situations (loading/editing large numbers of nodes). Of course, you can implement your own management with the driver to be faster, so it's a trade off of speed and development effort. The more speed you want, the more time you have to put into managing every tiny aspect (and the point of OGM is to plug it in and it just works).