2
votes

This is my first project using Neo4j and the associated spatial plug in. I am experiencing performance well below what I was expecting and below what's needed for this project. As a noob I may be missing something or have misunderstood something. Help is appreciated and needed.

I am experiencing very slow response time for Neo4j and Spatial plugin when trying to find surrounding OSM ways to a point specified by lat/lon to process GPS reading from a driven trip. I am calling spatial.closest ("layer', {lon, lat), 0.01) which is taking 6-11 seconds to process and return approximately 25 - 100 nodes.

I am running Neo4j community edition 3.0.4 and spatial 0.20 running on MacBook Pro 16GB / 512GB SSD. The OSM data is massachusetts-latest.osm (Massachusetts, USA.) I am accessing it via bolt and Cypher. Instrumented testing has been done from browser client, python client, java client as well as a custom version of spatial that reports timing for the spatial stored procedure. The Neo4j database is approximately 44GB in size, contains 76.5M nodes and 118.2M relationships. The schema and data are 'as-is' from the OSMImport.

To isolate the performance I added a custom version of spatial.closest( ) named spatial.timedClosest( ). The timedClosest() stored procedure takes the same input and has the same calls as spatial.closest(), but returns a Stream instead of a Stream. The Stream has timing information for the stored procedure.

The stored procedure execution time is split evenly between the internal call to getLayerOrThrow( ) and SpatialTopologyUtils.findClosestEdges( ).

1) Why does getLayer(layerName) take so long to execute? I am very surprised to observe getLayer(layerName) takes so long: 2.5 - 5 seconds. There is only one layer, the OSM layer, directly off the root node. I see the same hit on calls to spatial.getLayer(). Since the layer is an argument to many of the spatial procedures, this is a big deal. Anyone have insight into this?

2) Is there a way to speed up SpaitalTopologyUtils.findClosestEdges( )? Are there additional indexes that could be added to speed up the spatial proximity search?

My understanding is Neo4j is capable of handling billions of nodes / relationships. For this project I am planning to load North America OSM data. From my understanding of spatial plug in, it has spatial management and searching capabilities that would provide a good starting foundation.

1
Not solving your problem but if you just want nearby edges/ways you might have a look into other projects like Postgis or projects tuned for these purpose of 'map matching' github.com/graphhopper/map-matching (note I'm one of the authors of GraphHopper) then the database will under 100MB for a city and RAM usage similar low. - Karussell
Thank you @Karussell. Though good suggestions, this project is a technology evaluation / verification / proof of concept for a client. This phase is focused on Neo4j specifically and I need to responsibly pull on this thread. I have done some light lifting with Postgis extensions supporting OSM Nominatim for reverse geocoding. - Blake
@Blake, it has been over three years since your post. I am wondering if you could share whether or not you overcame the performance issues and if yes, how you did it. We are looking into using Neo4J in OSM-like spatial applications. Thanks! - Bo Guo

1 Answers

1
votes

@Bo Guo, sorry for the delayed response. I've been away from Neo4j for a bit. I replaced the existing indexing with geohash indexing (https://en.wikipedia.org/wiki/Geohash). As OSM data was loaded the roadways and boundaries were tested for intersections in geohash regions. Geohash worked nicely for lookup. Loading of the OSM data was still a bear. North America from OSM data on 8 core mid-range AMD server with SATA SSDs would take several days to a week.