0
votes

I have Neo4j 1.9.4 installed on 24 core 24Gb ram (centos) machine and for most queries CPU usage spikes goes to 200% with only few concurrent requests.

Domain:

some sort of social application where few types of nodes(profiles) with 3-30 text/array properties and 36 relationship types with at least 3 properties. Most of nodes currently has ~300-500 relationships.

Current data set footprint(from console):

LogicalLogSize=4294907 (32MB)
ArrayStoreSize=1675520 (12MB)
NodeStoreSize=1342170 (10MB)
PropertyStoreSize=1739548 (13MB)
RelationshipStoreSize=6395202 (48MB)
StringStoreSize=1478400 (11MB)

which is IMHO really small. most queries looks like this one(with more or less WITH .. MATCH .. statements and few queries with variable length relations but the often fast):

START
    targetUser=node({id}),
    currentUser=node({current})
MATCH
    targetUser-[contact:InContactsRelation]->n,
    n-[:InLocationRelation]->l,
    n-[:InCategoryRelation]->c
WITH
    currentUser, targetUser,n, l,c, contact.fav is not null as inFavorites
MATCH
    n<-[followers?:InContactsRelation]-()
WITH
    currentUser, targetUser,n, l,c,inFavorites, COUNT(followers) as numFollowers
RETURN
    id(n) as id,
    n.name? as name,
    n.title? as title,
    n._class as _class,
    n.avatar? as avatar,
    n.avatar_type? as avatar_type,
    l.name as location__name,
    c.name as category__name,
    true as isInContacts,
    inFavorites as isInFavorites,
    numFollowers

it runs in ~1s-3s(for first run) and ~1s-70ms (for consecutive and it depends on query) and there is about 5-10 queries runs for each impression. Another interesting behavior is when i try run query from console(neo4j) on my local machine many consecutive times(just press ctrl+enter for few seconds) it has almost constant execution time but when i do it on server it goes slower exponentially and i guess it somehow related with my problem.

Problem:

So my problem is that neo4j is very CPU greedy(for 24 core machine its may be not an issue but its obviously overkill for small project). First time i used AWS EC2 m1.large instance but over all performance was bad, during testing, CPU always was over 100%.

Some relevant parts of configuration:

neostore.nodestore.db.mapped_memory=1280M
wrapper.java.maxmemory=8192

note: I already tried configuration where all memory related parameters where HIGH and it didn't worked(no change at all).

Question:

Where to digg? configuration? scheme? queries? what i'm doing wrong?

if need more info(logs, configs) just ask ;)

1
Have you run the query in the console with the PROFILE keyword at the front of the query? It'll spit out some information about how the query was run.LameCoder

1 Answers

0
votes

The reason for subsequent invocations of the same query being much faster can be easily explained by the usage of caches. A common strategy is to run a cache warmup query upon startup, e.g.

start n=node(*) match n--m return count(n)

200% CPU usage on a 24 core means the machine is pretty lazy as only 2 cores are busy. When a query is in progress it's normal that CPU goes to 100% while running.

The Cypher statement above uses an optional match (in the 2nd match clause). These optional matches are known as being potentially slow. Check out if runtime changes if you make this a non-optional match.

When returning a larger result set consider that transferring the response is driven by network speed. Consider using streaming in the case, see http://docs.neo4j.org/chunked/milestone/rest-api-streaming.html.

You also should set wrapper.java.minmemory to the same value as wrapper.java.maxmemory.

Another approach for your rather small graph is to switch off MMIO caching and use cache_type=strong to keep the full dataset in the object cache. In this case you might need to increas wrapper.java.minmemory and wrapper.java.maxmemory.