While using neo4j graph database with cypher query language, I found that some computationally expensive queries run faster when "LIMIT {number}" is added to the end of the query. I also met few other queries in which even when added limit 1, it runs forever. (This happened when I was executing a variable path length query )
As given in the specification, LIMIT constrains the number of rows in the output. In that case my understanding is that the query is completed as such and only some rows are output. In which case, all computationally expensive queries should not be affected by the limit. How exactly does LIMIT work in neo4j cypher query?
1 Answers
First and foremost, it is worth mentioning that Cypher is a descriptive language, meaning that you say what you want, not how to get it. This means that in a simple MATCH (d:Cat)-[o:OWNS]->(g:Person)
, Cypher is free to start its search on d, o , or g, and then expand/filter as it likes to find all matches. This means that how a Cypher runs is not guaranteed, just the results it produces. A Cyphers performance will also very based on which interpreter version you are running it on (as it may decide to do smarter/dumber things).
If you have the LIMIT keyword, but EVERYTHING must be computed to take an accurate limit, the LIMIT keyword can only help by limiting rows in future matches. If you just need "First 5 Cats" MATCH (n:Cat) RETURN n LIMIT 5
, Than once Cypher has matched 5 cats, it knows it will just throw anything more out, so it can stop after 5 matches instead of matching all cats and taking 5. If you do MATCH (n:Cat) RETURN n ORDER BY n.name DESC LIMIT 5
, since match 5 is dependent on the sort, Cypher has no choice but to load all cats, sort them, and then limit to first 5.
LIMIT
. you should use it as soon in the query as possible. You can check how it affects output if you putPROFILE
in front of the query – Tomaž Bratanič