2
votes

What is the fastest way to get all unordered nodes and relationships from a running Neo4j 2.x server into a program?

Cypher MATCH n RETURN n is too slow for my use case (say we have >10M nodes to extract).

The shell command dump seems interesting but it requires some hack to call from a source code. Are there any benchmark available of dump?

Any advices appreciated!

--EDIT--

I execute the query thought the REST endpoint of a local Neo4j server (thus no network effect) with a query like MATCH n RETURN n SKPI 0 LIMIT 50000. My db is Neo4j 2.0.3 populated with 100k nodes of 1 integer property and no relationship. Computer: SSD with read speed 1.3+ Mo/s and CPU i7 1.6Ghz, JVM -Xmx2g. It takes ~4s to retreive 50k nodes:

curl -s -w %{time_total} -d"query=match n return n limit 50000" -D- -onul: http://localhost:7474/db/data/cypher

HTTP/1.1 200 OK
Content-Type: application/json; charset=UTF-8
Access-Control-Allow-Origin: *
Content-Length: 63394503
Server: Jetty(9.0.z-SNAPSHOT)

4,047
2
How do you execute match (n) return n? The tx endpoint should be fast enough, it is rather limited by disk speed of loading the properties and probably network, if you only need the structure you can use match (n) return id(n) as IDMichael Hunger

2 Answers

1
votes

What you want is enable HTTP chunked encoding (aka Steaming) to allow Neo4j to start sending you results without holding them all in memory. You do this by adding the Accept: application/json;stream=true HTTP request header.

This requests does the trick:

curl -i -o streamed.txt -XPOST \
  -d'{ "query":"MATCH n RETURN n" }' \
  -H 'accept:application/json;stream=true' \
  -H 'content-type:application/json' \
  'http://localhost:7474/db/data/cypher'

On a side note, if you want to start parsing the response on your side before having received the whole content (to avoid filling up your memory / hard drive), you may want to look into JSON stream parsing.

2
votes

The fastest way to get all nodes is to run Neo4j embedded. The performance degregation you see using the REST API via Cypher is largely due to the data transfer limitations over the network.

Using the method getAllNodes you can retrieve all the nodes in your graph without transfering the data over the network.

http://api.neo4j.org/current/org/neo4j/tooling/GlobalGraphOperations.html

try ( Transaction tx = db.beginTx(); ) {
    Iterable<Node> allNodes = db.getAllNodes();
    tx.success();
}

Note that this method is now deprecated as of 2.1.2.

To learn more about Neo4j embedded, take a look at the documentation.

http://docs.neo4j.org/chunked/stable/tutorials-java-embedded.html