0
votes

I am getting started with Neo4j and wondering about finding the nodes connected to another node at a length of at most k edges (friend of a friend of a friend ... up to k times). I will illustrate with the example from the tutorial in Neo4j itself (I put graph creation commands are at the bottom).

match (e {name:"Emil"})-[*1..2]-(p)
return DISTINCT e, p;

This query would return nodes connected to Emil and nodes connected to those nodes. My issue is that it seems this enumerates EVERY path of length 1-2 from Emil, though I don't care about enumerating all paths. From the query, it's clear I only care about the nodes connected to Emil at that distance and enumerating all possible paths is a poor way to realize that query. This is an issue in large, dense graphs as the complexity becomes overwhelming.

Remove DISTINCT and there will be 8 records, which is the number of unique 1-2 length paths from Emil. Based on my testing on larger graphs, it seems DISTINCT is a post-processing step that does not affect the runtime of the query, though it will eliminate the redundant output. Is that correct?

My main question, is there a way to form a Cypher query for this problem so that I am not traversing all the unique paths and the complexity can be reduced? Please let me know if I have a misunderstanding somewhere as well.

---- Commands to create the graph -----

CREATE (ee:Person { name: "Emil", from: "Sweden", klout: 99 })
CREATE (js:Person { name: "Johan", from: "Sweden", learn: "surfing" }),
(ir:Person { name: "Ian", from: "England", title: "author" }),
(rvb:Person { name: "Rik", from: "Belgium", pet: "Orval" }),
(ally:Person { name: "Allison", from: "California", hobby: "surfing" }),
(ee)-[:KNOWS {since: 2001}]->(js),(ee)-[:KNOWS {rating: 5}]->(ir),
(js)-[:KNOWS]->(ir),(js)-[:KNOWS]->(rvb),
(ir)-[:KNOWS]->(js),(ir)-[:KNOWS]->(ally),
(rvb)-[:KNOWS]->(ally)
1

1 Answers

0
votes

Max De Marzi at Neo wrote a great blog post about these kinds of queries, so variable-length matches where the only interest is for distinct nodes, not paths, will be detected and optimized by the query planner in a future release, likely 3.2.

In the meantime, APOC Procedures has a solution in their apoc.path.expandConfig() call, when you supply 'NODE_GLOBAL' as the uniqueness parameter.

This will ensure nodes found during the path expansion are only visited once, so you shouldn't see multiple paths to the same node.

match (e {name:"Emil"})
call apoc.path.expandConfig(e, {maxLevel:2, uniqueness:'NODE_GLOBAL'}) yield path
WITH e, last(nodes(path)) as subgraph
where e <> subgraph
return  e, subgraph;