I have imported data using Michael Hunger's Batch Import, through which I created:-
4,612,893 nodes
14,495,063 properties
node properties are indexed.
5,300,237 relationships
{Question} Cypher queries are executing too slow almost crawling, simple traversal is taking > 5 mins to return resultset, Please let me know how to tune the server to get better performance and what I am doing wrong.
Store Details:-
-rw-r--r-- 1 root root 567M Jul 12 12:42 data/graph.db/neostore.propertystore.db
-rw-r--r-- 1 root root 167M Jul 12 12:42 data/graph.db/neostore.relationshipstore.db
-rw-r--r-- 1 root root 40M Jul 12 12:42 data/graph.db/neostore.nodestore.db
-rw-r--r-- 1 root root 7.8M Jul 12 12:42 data/graph.db/neostore.propertystore.db.strings
-rw-r--r-- 1 root root 330 Jul 12 12:42 data/graph.db/neostore.propertystore.db.index.keys
-rw-r--r-- 1 root root 292 Jul 12 12:42 data/graph.db/neostore.relationshiptypestore.db.names
-rw-r--r-- 1 root root 153 Jul 12 12:42 data/graph.db/neostore.propertystore.db.arrays
-rw-r--r-- 1 root root 88 Jul 12 12:42 data/graph.db/neostore.propertystore.db.index
-rw-r--r-- 1 root root 69 Jul 12 12:42 data/graph.db/neostore
-rw-r--r-- 1 root root 58 Jul 12 12:42 data/graph.db/neostore.relationshiptypestore.db
-rw-r--r-- 1 root root 9 Jul 12 12:42 data/graph.db/neostore.id
-rw-r--r-- 1 root root 9 Jul 12 12:42 data/graph.db/neostore.nodestore.db.id
-rw-r--r-- 1 root root 9 Jul 12 12:42 data/graph.db/neostore.propertystore.db.arrays.id
-rw-r--r-- 1 root root 9 Jul 12 12:42 data/graph.db/neostore.propertystore.db.id
-rw-r--r-- 1 root root 9 Jul 12 12:42 data/graph.db/neostore.propertystore.db.index.id
-rw-r--r-- 1 root root 9 Jul 12 12:42 data/graph.db/neostore.propertystore.db.index.keys.id
-rw-r--r-- 1 root root 9 Jul 12 12:42 data/graph.db/neostore.propertystore.db.strings.id
-rw-r--r-- 1 root root 9 Jul 12 12:42 data/graph.db/neostore.relationshipstore.db.id
-rw-r--r-- 1 root root 9 Jul 12 12:42 data/graph.db/neostore.relationshiptypestore.db.id
-rw-r--r-- 1 root root 9 Jul 12 12:42 data/graph.db/neostore.relationshiptypestore.db.names.id
I am using
neo4j-community-1.9.1
java version "1.7.0_25"
Amazon EC2 m1.large instance with Ubuntu 12.04.2 LTS (GNU/Linux 3.2.0-40-virtual x86_64)
RAM ~8GB.
EBS 200 GB, neo4j is running on EBS volume.
Invoked as ./neo4j-community-1.9.1/bin/neo4j start
Below are the neo4j server info:
neostore.nodestore.db.mapped_memory 161M
neostore.relationshipstore.db.mapped_memory 714M
neostore.propertystore.db.mapped_memory 90M
neostore.propertystore.db.index.keys.mapped_memory 1M
neostore.propertystore.db.strings.mapped_memory 130M
neostore.propertystore.db.arrays.mapped_memory 130M
mapped_memory_page_size 1M
all_stores_total_mapped_memory_size 500M
{Data Model} is like Social Graph :-
User-User
User-[:FOLLOWS]->User
User-Item
User-[:CREATED]->Item
User-[:LIKE]->Item
User-[:COMMENT]->Item
User-[:VIEW]->Item
Cluster-User
User-[:FACEBOOK]->SocialLogin_Cluster
Cluster-Item
Item-[:KIND_OF]->Type_Cluster
Cluster-Cluster
Cluster-[:KIND_OF]->Type_Cluster
{Some Queries} and time:
START u=node(467242)
MATCH u-[r1:LIKE|COMMENT]->a<-[r2:LIKE|COMMENT]-lu-[r3:LIKE]-b
WHERE NOT(a=b)
RETURN u,COUNT(b)
Query took 1015348ms. Returned 70956115 result count.
START a=node:nodes(kind="user")
RETURN a,length(a-[:CREATED|LIKE|COMMENT|FOLLOWS]-()) AS cnt
ORDER BY cnt DESC
LIMIT 10
Query took 231613ms
From the suggestions, I upgraged the box to M1.xlarge and M2.2xlarge
- M1.xlarge (vCPU:4,ECU:8,RAM:15 GB,Instance Storage:~600 GB)
- M2.2xlarge (vCPU:4,ECU:13,RAM:34 GB,Instance Storage:~800 GB)
I tuned the properties like below, and running from instance storage (as against EBS)
neo4j.properties
neostore.nodestore.db.mapped_memory=1800M
neostore.relationshipstore.db.mapped_memory=1800M
neostore.propertystore.db.mapped_memory=100M
neostore.propertystore.db.strings.mapped_memory=150M
neostore.propertystore.db.arrays.mapped_memory=10M
neo4j-wrapper.conf
wrapper.java.additional.1=-d64
wrapper.java.additional.1=-server
wrapper.java.additional=-XX:+UseConcMarkSweepGC
wrapper.java.additional=-XX:+CMSClassUnloadingEnabled
wrapper.java.initmemory=4098
wrapper.java.maxmemory=8192
but still the queries (like below) run in minutes ~5-8 minutes, which is not acceptable from recommendation point of view.
Query:
START u=node(467242)
MATCH u-[r1:LIKE]->a<-[r2:LIKE]-lu-[r3:LIKE]-b
RETURN u,COUNT(b)
{Profiling}
neo4j-sh (0)$ profile START u=node(467242) MATCH u-[r1:LIKE|COMMENT]->a<-[r2:LIKE|COMMENT]-lu-[r3:LIKE]-b RETURN u,COUNT(b);
==> +-------------------------+
==> | u | COUNT(b) |
==> +-------------------------+
==> | Node[467242] | 70960482 |
==> +-------------------------+
==> 1 row
==>
==> ColumnFilter(symKeys=["u", " INTERNAL_AGGREGATEad2ab10d-cfc3-48c2-bea9-be4b9c1b5595"], returnItemNames=["u", "COUNT(b)"], _rows=1, _db_hits=0)
==> EagerAggregation(keys=["u"], aggregates=["( INTERNAL_AGGREGATEad2ab10d-cfc3-48c2-bea9-be4b9c1b5595,Count)"], _rows=1, _db_hits=0)
==> TraversalMatcher(trail="(u)-[r1:LIKE|COMMENT WHERE true AND true]->(a)<-[r2:LIKE|COMMENT WHERE true AND true]-(lu)-[r3:LIKE WHERE true AND true]-(b)", _rows=70960482, _db_hits=71452891)
==> ParameterPipe(_rows=1, _db_hits=0)
neo4j-sh (0)$ profile START u=node(467242) MATCH u-[r1:LIKE|COMMENT]->a<-[r2:LIKE|COMMENT]-lu-[r3:LIKE]-b RETURN count(distinct a),COUNT(distinct b),COUNT(*);
==> +--------------------------------------------------+
==> | count(distinct a) | COUNT(distinct b) | COUNT(*) |
==> +--------------------------------------------------+
==> | 1950 | 91294 | 70960482 |
==> +--------------------------------------------------+
==> 1 row
==>
==> ColumnFilter(symKeys=[" INTERNAL_AGGREGATEe6b94644-0a55-43d9-8337-491ac0b29c8c", " INTERNAL_AGGREGATE1cfcd797-7585-4240-84ef-eff41a59af33", " INTERNAL_AGGREGATEea9176b2-1991-443c-bdd4-c63f4854d005"], returnItemNames=["count(distinct a)", "COUNT(distinct b)", "COUNT(*)"], _rows=1, _db_hits=0)
==> EagerAggregation(keys=[], aggregates=["( INTERNAL_AGGREGATEe6b94644-0a55-43d9-8337-491ac0b29c8c,Distinct)", "( INTERNAL_AGGREGATE1cfcd797-7585-4240-84ef-eff41a59af33,Distinct)", "( INTERNAL_AGGREGATEea9176b2-1991-443c-bdd4-c63f4854d005,CountStar)"], _rows=1, _db_hits=0)
==> TraversalMatcher(trail="(u)-[r1:LIKE|COMMENT WHERE true AND true]->(a)<-[r2:LIKE|COMMENT WHERE true AND true]-(lu)-[r3:LIKE WHERE true AND true]-(b)", _rows=70960482, _db_hits=71452891)
==> ParameterPipe(_rows=1, _db_hits=0)
Please let me know the configuration and neo4j startup arguments for tuning. Thanks in advance