Optimize neo4j cypher query with very large dataset

Question

I'm trying to figure out how to optimize a cypher query on a very large dataset. I'm trying to find 2nd or 3rd degree friends in the same city. My current cypher query is, which takes over 1 minute to run:

match (n:User {id: 123})-[:LIVES_IN]->()<-[:LIVES_IN]-(u:User), (n)-[:FRIENDS_WITH*2..3]-(u) WHERE u.age >= 20 AND u.age <= 36 return u limit 100

There are approximately 500K User nodes and 500M FRIENDS_WITH relationships. I already have indexes on the id and age properties. The query seems to be choking on the FRIENDS_WITH requirement. Is there any way to think about this in a different way or optimize the cypher to make it real-time (i.e., max time 1-2 seconds)?

Here's the profile of the query:

Imgur

Thanks.

Do you have a test database that you could export to run cypher queries against? — manonthemat

František Hartman František Hartman · Accepted Answer · 2016-02-01T17:46:04

Create index on id property for label User:

CREATE INDEX ON :User(id)

See documentation for schema indexes for more information http://neo4j.com/docs/stable/query-schema-index.html

If that doesn't help add a result of PROFILE query and we might be able to help you more

PROFILE MATCH ... rest of your query

Also it might be worth trying rewriting the query the following way:

MATCH (n:User {id: 123})-[:LIVES_IN]->()<-[:LIVES_IN]-(u:User),
(n)-[:FRIENDS_WITH*2..3]-(u)
WHERE u.age >= 20 AND u.age <= 36 
return u limit 100

Optimize neo4j cypher query with very large dataset

1 Answers