I have a Neo4j database with 7340 nodes. Each node has a label (neoplasm) and 2 properties (conceptID and fullySpecifiedName). Autoindexing is enabled on both properties, and I have created a schema index on neoplasm:conceptID and neoplasm:fullySpecifiedName. The nodes are concepts in a terminology tree. There is a single root node and the others descend often via several paths to a depth of up to 13 levels. From a SQL Server implementation, the hierarchy structure is as follows...
Depth Relationship Count
0 1
1 37
2 360
3 1598
4 3825
5 6406
6 7967
7 7047
8 4687
9 2271
10 825
11 258
12 77
13 3
I am adding the relationships using a C# program and neo4jclient which contructs and executes cypher queries like this one...
MATCH (child:neoplasm), (parent:neoplasm)
WHERE child.conceptID = "448257000" AND parent.conceptID="372095001"
CREATE child-[:ISA]->parent
Adding the relationships up to level 3 was very fast, and level 4 itself was not bad, but at level 5 things started getting very slow, an average of over 9 seconds per relationship.
The example query above was executed through the http://localhost:7474/browser/
interface and took 12917ms, so the poor execution times are not a feature of the C# code nor the neo4jclient API.
I thought graph databases were supposed to be blindingly fast and that the performance was independent of size.
So far I have added just 9033 out of 35362 relationships. Even if the speed does not degrade further as the number of relationships increases, it will take over three days to add the remainder!
Can anyone suggest why this performance is so bad? Or is write performance of this nature normal, and it is just read performance that is so good. A sample Cypher query to return parents of a level 5 node returns a list of 23 fullySpecifiedName properties in less time than I can measure with a stop watch! (well under a second).