neo4j cypher performance with multiple start nodes

Question

http://console.neo4j.org/r/8mkc4z

In the grpah above, the purpose of the query

start n=node(1) match n-[:KNOWS]->m-[:KNOWS]->p where p.name='Cypher' return n, m, p

Is to find m, such that Neo knows m and m knows Cypher.

The same could be achieved by the following query too -

start n=node(1), p=node(4) match n-[:KNOWS]->m-[:KNOWS]->p return n, m, p

The first one uses where condition and second one uses multiple start nodes.

From performance perspective, which one should run faster and possibly in what scenarios.

I have faced performance issues with multiple start nodes whereas I think, logically having it as start node rather than where condition should be faster.

Are there any rules on what approach to use based on different scenarios.

Cypher is using the new, bi-directional pattern matcher for two start-points in Neo4j 1.9.M01 might want to try that out and report back. — Michael Hunger

Michael Hunger Michael Hunger · Accepted Answer · 2012-09-02T21:29:21

So far we've worked on cypher the language, adding updating features in 1.8.

In Neo4j 1.9 we will focus on cypher performance.

So far pattern matchers with a single start-points are faster than ones with multiple start points. Still if the filtering is done only after the fact (like in your first query) they may still perform slower (depends on the result volume).

But that will change in the course of the next release. I think the best tip I can give you so far is to profile the queries with your realistic datasets (write data generators if you don't have the expected data yet).

neo4j cypher performance with multiple start nodes

1 Answers