I have a Neo4J instance running with the Neo4J Spatial plugin. In it, I have a graph with around 3.5 k nodes each with the same label, we'll call Basket. Each Basket relates to a physical location in the same city, and the density of these baskets is very variable. I have calculated walking times between each Basket and all of its neighbours within 600m, and stored these as non-spatial (directed) relationships between nodes. Thus, some Baskets exist as what seems to be part of a big cluster, and others exist almost on their own, with only one or almost no relationships to other Baskets.
My users have a problem: they wish to begin in one place, and end in another place, visiting an arbitrary, user-defined, number of Baskets along the way. My program aims to provide a few route options for the user (as a sequence of nodes - I'll sort the actual how-to-walk-there part later), calculating the n-th number of shortest paths.
I've written a cypher query to do this, below.
start a = node(5955), b=node(6497)
WITH a,b
MATCH p=((a)-[r:IS_WALKABLE_TO*4..5]->(b))
RETURN p
N.B. - nodes 5955
and 6497
are two nodes I picked about 2 miles apart, in this instance I decided to opt for between 4 and 5 baskets along the way.
However, I keep running into an out of memory exception, and so would like advice on how to reduce the memory demand for this problem to make it perform on an affordable server in an acceptable time of 1 to 6 seconds.
My understanding is that Neo4j would not perform a Cartesian Product to find the solution, but kind of "pick each node and sniff around from each one until it finds a suitable-sized connection" (please, forgive my phrasing!), so I'm confused about the heap memory error.
My thoughts for improving the program are to:
Somehow restrict the path-finding part of the query to nodes within a bounding box, determined by the placing of the start and end node (i.e., add 500 m in each direction, then limit the query to these nodes). However, I can't find any documentation on how to do this - is it possible without having to create another spatial layer for each query?
Re-write the query in a way which doesn't create a memory error - is this doable easily?
Stop using Neo4J for this entirely and write an algorithm to do it manually using an alternative language. If so, what language would you recommend? C? C++ / C#? Or could I stick with Python / Ruby / Java / Go? (or, I was even thinking I might be able to do it in PHP quite effectively but I'm not sure if that was a moment of madness).
Any help and advice about how to tackle this much appreciated!
:IS_WALKABLE_TO
so you consider ALL relationships between your nodes. – Michael Hunger