I am getting started with Neo4j and wondering about finding the nodes connected to another node at a length of at most k edges (friend of a friend of a friend ... up to k times). I will illustrate with the example from the tutorial in Neo4j itself (I put graph creation commands are at the bottom).
match (e {name:"Emil"})-[*1..2]-(p)
return DISTINCT e, p;
This query would return nodes connected to Emil and nodes connected to those nodes. My issue is that it seems this enumerates EVERY path of length 1-2 from Emil, though I don't care about enumerating all paths. From the query, it's clear I only care about the nodes connected to Emil at that distance and enumerating all possible paths is a poor way to realize that query. This is an issue in large, dense graphs as the complexity becomes overwhelming.
Remove DISTINCT and there will be 8 records, which is the number of unique 1-2 length paths from Emil. Based on my testing on larger graphs, it seems DISTINCT is a post-processing step that does not affect the runtime of the query, though it will eliminate the redundant output. Is that correct?
My main question, is there a way to form a Cypher query for this problem so that I am not traversing all the unique paths and the complexity can be reduced? Please let me know if I have a misunderstanding somewhere as well.
---- Commands to create the graph -----
CREATE (ee:Person { name: "Emil", from: "Sweden", klout: 99 })
CREATE (js:Person { name: "Johan", from: "Sweden", learn: "surfing" }),
(ir:Person { name: "Ian", from: "England", title: "author" }),
(rvb:Person { name: "Rik", from: "Belgium", pet: "Orval" }),
(ally:Person { name: "Allison", from: "California", hobby: "surfing" }),
(ee)-[:KNOWS {since: 2001}]->(js),(ee)-[:KNOWS {rating: 5}]->(ir),
(js)-[:KNOWS]->(ir),(js)-[:KNOWS]->(rvb),
(ir)-[:KNOWS]->(js),(ir)-[:KNOWS]->(ally),
(rvb)-[:KNOWS]->(ally)