1
votes

I was happily using neo4j 1.8.1 community edition for a while on my system with the following configuration.

System Specs:

  • OS: 32-bit Ubuntu 12.04.3 LTS. Kernel version 3.2.0-52-generic-pae #78-Ubuntu
  • Memory: 4GB
  • Swap: 8GB (swapfile - not a partition)
  • Processor: Intel® Core™ i5-2430M CPU @ 2.40GHz - Quad Core
  • Harddisk: 500GB Seagate ATA ST9500420AS. Dual boot - Ubuntu uses 100GB and the rest by the almighty Windows 7.

When I switched to neo4j 2.0.1 enterprise edition, my application's response time became 4x slower. So, as advised in http://docs.neo4j.org/chunked/stable/embedded-configuration.html, I started tuning my filesystem, virtual memory, I/O-schedular and JVM configurations.

Performance Tuning

  • Started Neo4j as a server with highest scheduling priority (nice value = -20)

  • Set vm.dirty_background_ratio=50 and vm.dirty_ratio=80 in /etc/sysctl.conf to reduce frequent flushing of dirty memory pages to disk.

  • Increased maximum number of open files from 1024 to 40,000 as suggested in Neo4j startup.

  • Set noatime,nodiratime for the neo4j ext4 partition in /etc/fstab so that inodes don't get updated every time there is a file/directory access.

  • Changed I/O scheular to "noop" from "cfq" as mentioned in http://www.cyberciti.biz/faq/linux-change-io-scheduler-for-harddisk/

  • JVM parameters: In short, max heap size is 1GB and neostore memory mapped files size is 425 MB.

    Xms and Xmx to 1GB. GC to Concurrent-Mark-Sweep. neostore.nodestore.db.mapped_memory=25M, neostore.relationshipstore.db.mapped_memory=50M neostore.propertystore.db.mapped_memory=90M neostore.propertystore.db.strings.mapped_memory=130M neostore.propertystore.db.arrays.mapped_memory=130M

Sadly, this didn't make any difference. I wrote a simple script which creates N nodes and M random relationships among these nodes to get a better picture.

Neo4j 1.8.1 community edition with oracle java version "1.6.0_45":

new-sys-admin@ThinkPad:~/temp$ php perftest.php    
Creating 1000 Nodes with index 
Time taken : 67.02s

Creating 4000 relationships 
Time taken : 201.27s

Neo4j 2.0.1 enterprise edition with oracle java version "1.7.0_51":

new-sys-admin@ThinkPad:~/temp$ php perftest.php 
Creating 1000 Nodes with index 
Time taken : 75.14s

Creating 4000 relationships 
Time taken : 206.52s

The above results are after 2 warm-up runs. 2.0.1 results seem slower than 1.8.1. Any suggestions on adjusting the relevant configurations to boost up neo4j 2.0.1 performance would be highly appreciated.

EDIT 1

All queries are issued using Gremlin via Everyman Neo4j wrapper.

http://grokbase.com/p/gg/neo4j/143w1fen8c/gremlin-plugin-extremely-slow-on-neo4j-2-0-1

In the mean time, I moved to neo4j-enterprise-edition-1.9.6 (the next recent stable release before 2.0.1) and things were back to Normal

1
Did you also compare with enterprise of the same version of 1.8.1 or community of 2.0.1? Can you please share your code too?Michael Hunger
Not yet. I primarily wanted to use enterprise edition for JMX monitoring and backup support. Does all 1.x.x enterprise editions support those? I will share the test code in a while.new_sys_admin
Please note that I use Everyman PHP wrapper with no cypher or schema indexing. Here is the link to the code, codepad.org/wzN2aZJCnew_sys_admin

1 Answers

3
votes

From the fact that you're using PHP, and seeing that creating just a 1000 nodes is 67 seconds, I assume you're using the regular REST API (eg. POST /db/data/node). If this is correct, you may be right that 2.0.1 is some percentage point slower than 1.8 for these CRUD operations. In 2.0 we focused on optimizing Cypher and the new transactional endpoint.

As such, for best performance, I'd suggest these things:

  1. Use the new transactional endpoint, /db/data/transaction

  2. Use cypher, and use it to send as much work as possible in "one go" over to the server

  3. When possible, send multiple cypher queries in the same HTTP request, you can do this as well through the transactional endpoint.

  4. Make sure you re-use TCP connections if you can, I'm not sure exactly how this works in PHP, but sending "Connection: Keep-alive" header and ensuring you re-use the same tcp connection saves significant overhead, since you don't have to re-establish TCP connections over and over.

Creating a thousand nodes in one cypher query shouldn't take more than a few milliseconds. In terms of how many cypher statements you can send per second, on my laptop and from python (using https://github.com/jakewins/neo4jdb-python), I get about 10 000 cypher statements per second in a concurrent setup (10 clients).