Low performance of neo4j

Question

I am server engineer in company that provide dating service. Currently I am building a PoC for our new recommendation engine. I try to use neo4j. But performance of this database does not meet our needs. I have strong feeling that I am doing something wrong and neo4j can do much better. So can someone give me an advice how to improve performance of my Cypher’s query or how to tune neo4j in right way? I am using neo4j-enterprise-2.3.1 which is running on c4.4xlarge instance with Amazon Linux. In our dataset each user can have 4 types of relationships with others users - LIKE, DISLIKE, BLOCK and MATCH. Also he has a properties like countryCode, birthday and gender.

I made import of all our users and relationships from RDBMS to neo4j using neo4j-import tool. So each user is a node with properties and each reference is a relationship.

The report from neo4j-import tool said that :

2 558 667 nodes,

1 674 714 539 properties and

1 664 532 288 relationships

were imported.

So it’s huge DB :-) In our case some nodes can have up to 30 000 outgoing relationships..

I made 3 indexes in neo4j :

Indexes
ON :User(userId)           ONLINE  
ON :User(countryCode)      ONLINE  
ON :User(birthday)         ONLINE

Then I try to build online recommendation engine using this query :

MATCH (me:User {userId: {source_user_id} })-[:LIKE | :MATCH]->()<-[:LIKE |  :MATCH]-(similar:User)
USING INDEX me:User(userId)
USING INDEX similar:User(birthday)
WHERE similar.birthday >= {target_age_gte} AND
      similar.birthday <= {target_age_lte} AND
      similar.countryCode = {target_country_code} AND
      similar.gender = {source_gender}
WITH similar, count(*) as weight ORDER BY weight DESC 
SKIP {skip_similar_person} LIMIT {limit_similar_person}
MATCH (similar)-[:LIKE | :MATCH]-(recommendation:User)
WITH recommendation, count(*) as sheWeight
WHERE recommendation.birthday >= {recommendation_age_gte} AND
      recommendation.birthday <= {recommendation_age_lte} AND
      recommendation.gender= {target_gender}
WITH recommendation, sheWeight ORDER BY sheWeight DESC 
SKIP {skip_person} LIMIT {limit_person}
MATCH (me:User {userId: {source_user_id} })
WHERE NOT ((me)--(recommendation))
RETURN recommendation

here is the execution plan for one of the user : plan

When I executed this query for list of users I had the result :

count=2391, min=4565.128849, max=36257.170065, mean=13556.750555555178, stddev=2250.149335254768, median=13405.409811, p75=15361.353029999998, p95=17385.136478, p98=18040.900481, p99=18426.811424, p999=19506.149138, mean_rate=0.9957385490980866, m1=1.2148195797996817, m5=1.1418078036067119, m15=0.9928564378521962, rate_unit=events/second, duration_unit=milliseconds

So even the fastest is too slow for Real-time recommendations..

Can you tell me what I am doing wrong?

Thanks.

EDIT 1 : plan with the expanded boxes :

Hey Mike, can you drop me an email, michael at neo4j.com, would love to get access to your database to help you with your query. — Michael Hunger

Max De Marzi Max De Marzi · Accepted Answer · 2016-02-16T17:48:46

I built an unmanaged extension to see if I could do better than Cypher. You can grab it here => https://github.com/maxdemarzi/social_dna

This is a first shot, there are a couple of things we can do to speed things up. We can pre-calculate/save similar users, cache things here and there, and random other tricks. Give it a shot, let us know how it goes.

Regards, Max

Low performance of neo4j

3 Answers