Neo4j: How to find for each node its next neighbour by distance and create a relationship

Question

I imported a large set of nodes (>16 000) where each node contains the information about a location (longitudinal/lateral geo-data). All nodes have the same label. There are no relationships in this scenario. Now I want to identify for each node the next neighbour by distance and create a relationship between these nodes.

This (brute force) way worked well for sets containing about 1000 nodes: (1) I first defined relationships between all nodes containing the distance information. (2) Then I defined for all relationships the property "mindist=false".(3) After that I identified the next neighbour looking at the the distance information for each relationship and set "mindist" property "true" where the relationship represents the shortest distance. (4) Finally I deleted all relationships with "mindist=false".

(1)

match (n1:XXX),(n2:XXX)
where id(n1) <> id(n2)
with n1,n2,distance(n1.location,n2.location) as dist
create(n1)-[R:DISTANCE{dist:dist}]->(n2)
Return R

(2)

match (n1:XXX)-[R:DISTANCE]->(n2:XXX)
set R.mindist=false return R.mindist

(3)

match (n1:XXX)-[R:DISTANCE]->(n2:XXX)
with n1, min(R.dist) as mindist
match (o1:XXX)-[r:DISTANCE]->(o2:XXX)
where o1.name=n1.name and r.dist=mindist
Set r.mindist=TRUE
return r

(4)

match (n)-[R:DISTANCE]->()
where R.mindist=false
delete R return n

With sets containing about 16000 nodes this solution didn't work (memory problems ...). I am sure there is a smarter way to solve this problem (but at this point of time I am still short on experience working with neo4j/cypher). ;-)

What should happen if there are two neighbors at the same distance? — Rajendra Kadam
In this theoretic case there should be two/more relationships (But I am using real geo-data with more than 10 digits after the point. So I think this scenario will not take place.). — CB_Dev
For 1090 nodes step 1 was completed after round about 10 seconds. Steps 2-4 took many minutes (sorry, I don't know exactly) ... finally it worked,... but not efficiently :-( — CB_Dev

Rajendra Kadam Rajendra Kadam · Accepted Answer · 2019-09-05T12:30:40

You can process find the closest neighbor one by one for each node in batch using APOC. (This is also a brute-force way, but runs faster). It takes around 75 seconds for 7322 nodes.

CALL apoc.periodic.iterate("MATCH (n1:XXX) 
RETURN n1", "
WITH n1
MATCH (n2:XXX)
WHERE id(n1) <> id(n2)
WITH n1, n2, distance(n1.location,n2.location) as dist ORDER BY dist LIMIT 1
CREATE (n1)-[r:DISTANCE{dist:dist}]->(n2)", {batchSize:1, parallel:true, concurrency:10})

NOTE: batchSize should be always 1 in this query. You can change concurrency for experimentation.

Neo4j: How to find for each node its next neighbour by distance and create a relationship

2 Answers