8
votes

My objective is to write a shortest path algorithm for a road network.

Currently my architecture is something like that: I store all the data in the PostGIS enabled PostgreSQL database. I do one SELECT * FROM ways, which takes less than 3 seconds on a table with 100,000 edges (ways) and after that I will apply a (Java, Ruby or anything-based) shortest path algorithm to the graph that already resides in memory. The second operation can take about 1.5 seconds on a graph with 100,000 edges.

So, it takes:

  • 2-3 seconds to load all the ways from the database into memory and create a graph (nodes are stored in one table with ways(edges));
  • 1-1.5 seconds to calculate a shortest path on a graph which is already in memory.

This is very similar to what pgRouting does (to my knowledge it uses C Boost to store the graph in memory), except pgRouting takes about 2 seconds in total to compute a shortest path on the same data set (yes, it is fast, but it is a black box for me, so I need my own).

But recently I found about Graph databases and about Neo4j. On their site they claim that "Still being able to do these calculations in sub-second speeds on graphs of millions of roads and waypoints makes it possible in many cases to abandon the normal approach of precomputing indexes with K/V stores and be able to put routing into the critical path with the possibility to adapt to the live conditions and build highly personalized and dynamic spatial services.".

So the question is: Will a graph database be faster with my particular problem?

The problem has the following properties:

  • the database consists of one table (ways);
  • the only query to the database is to get all the ways into the memory (to build a graph);
  • I do not need scalability, i.e. it is likely that the graph will not grow.
4

4 Answers

3
votes

You certainly dont have to reinvent the wheel if you are using any graph database, like Neo4j. Many shortest path algorithms are built into this and it's designed to handle complexity in case you have to consider speed limitation in any specific road, one-way road, score of a road etc. How do you keep up with performance when your data grows 10 times, or, 100 times. Considering your total computation time 3sec for 100,000 ways, it can be in minutes for 1M ways and in Neo4j, the response will be in milli sec.

3
votes

The breakthrough with graph databases is not only performances, it more about concept: your routing algorithms deal with single relational graphs (that is graph were links are all of the same type) whereas with graphdatabases you have a multi-relational graph.

This enables you to compute the shortest path between nodes taking only a specific kind of edge or avoid another type.

For more information you should read about the algebra behind graph db and the concept of pipes.

I strongly recommend thinkerpop project to start with graph database.

1
votes

I don't have experience with "graph" databases but judging by your question i have a few things in mind.

First of all, the straightforward answer will be "Create such a graph database and do a performance comparison with your solution". You could measure memory usage, execution time (speed), cpu utilization and/or possibly other metrics. That would provide you with enough data to make your decision.

My other advice is to revise your method. The three problem properties that you described (one table, loading all paths & no need for scalability) apply in your current domain but not in the graph databases' one. It's a whole different programming paradigm and you might have to adjust and adapt your method to suit the domain of those special kind of databases. It is unreasonable to do performance or any other kind of comparisons if you're applying your standard approach in a non-standard environment (like that graph database).

Recap: Translate your problem to the terms of the graph database and model it accordingly. After doing that, do a performance comparison between the two solutions.

My bet is, assuming that you translated & modeled your problem suitably for the graph database, it will grant you better performance. Your classical approach of "store-read-sort" is simple but not that effective unless optimized aggressively.

0
votes

A graph database probably won't load all of your data into memory initially, but over time, as good ones are designed to deal with extremely large datasets. However, once the data is there, the graph database has to do less work that the relational database to traverse the links. This is because it can directly access related objects using their identities, rather than having to use B-tree indices and (possibly) a join table, so it should be faster once the nodes and edges are cached.