Neo4j database very slow to add relationships

Question

I have a Neo4j database with 7340 nodes. Each node has a label (neoplasm) and 2 properties (conceptID and fullySpecifiedName). Autoindexing is enabled on both properties, and I have created a schema index on neoplasm:conceptID and neoplasm:fullySpecifiedName. The nodes are concepts in a terminology tree. There is a single root node and the others descend often via several paths to a depth of up to 13 levels. From a SQL Server implementation, the hierarchy structure is as follows...

Depth Relationship Count
0     1
1     37
2     360
3     1598
4     3825
5     6406
6     7967
7     7047
8     4687
9     2271
10    825
11    258
12    77
13    3

I am adding the relationships using a C# program and neo4jclient which contructs and executes cypher queries like this one...

MATCH (child:neoplasm), (parent:neoplasm)
WHERE child.conceptID = "448257000"   AND parent.conceptID="372095001"   
CREATE child-[:ISA]->parent

Adding the relationships up to level 3 was very fast, and level 4 itself was not bad, but at level 5 things started getting very slow, an average of over 9 seconds per relationship.

The example query above was executed through the http://localhost:7474/browser/ interface and took 12917ms, so the poor execution times are not a feature of the C# code nor the neo4jclient API.

I thought graph databases were supposed to be blindingly fast and that the performance was independent of size.

So far I have added just 9033 out of 35362 relationships. Even if the speed does not degrade further as the number of relationships increases, it will take over three days to add the remainder!

Can anyone suggest why this performance is so bad? Or is write performance of this nature normal, and it is just read performance that is so good. A sample Cypher query to return parents of a level 5 node returns a list of 23 fullySpecifiedName properties in less time than I can measure with a stop watch! (well under a second).

Do you have an index on :neoplasm(conceptId)? Traversals are cheap, but lookups by id still require approaches like indexing. — Tatham Oddie
To verify that the index is really used can you post the query plan printed when "PROFILE MATCH (child:neoplasm), (parent:neoplasm) WHERE child.conceptID = "448257000" AND parent.conceptID="372095001" CREATE child-[:ISA]->parent" is executed in the shell? — Stefan Armbruster

Peter Neubauer Peter Neubauer · Accepted Answer · 2013-10-23T09:39:54

When using different Indexes on labels at the same time, Cypher does not (yet) choose these to make the query faster, instead, try giving hints to use them, see http://docs.neo4j.org/chunked/milestone/query-using.html#using-query-using-multiple-index-hints

PROFILE
MATCH (child:neoplasm), (parent:neoplasm)
WHERE child.conceptID = "448257000"   AND parent.conceptID="372095001"   
USING INDEX child:neoplasm(conceptID)
USING INDEX parent:neoplasm(conceptID)
CREATE child-[:ISA]->parent

Does that improve things? Also, please post the PROFILE output for better insight.

Neo4j database very slow to add relationships

3 Answers