1
votes

Here's the rundown of what I need:

  • A graph database
  • Each node is a document; there will be hundreds of types of nodes; each of these several hundred types will have its own consistent schema.
  • Can scale to billions of nodes
  • Each node also has a (lat,lng) cooordinate in addition to the edges between nodes
  • I want to use (lat,lng) as a shard key so this can be scaled to a large sharded, replicated cluster. Edge traversals will occur ~95% within nearby (lat,lng) locations.
  • I want to be able to issue geo+document queries. For example "Show me all the graph nodes/documents matching this query { ... } ordered by distance from (lat_0, lng_0)"
  • I want something that's well-documented, has an active developer community, is recommended for production use, and likely to be around for years.

Here are problems with existing databases:

  • MongoDB: no graph support, no joins
  • Neo4j: no sharding
  • OrientDB: no geospatial indexing
  • ArangoDB: can do WITHIN queries but cannot have additional query clauses (e.g. MongoDB's geoNear has a query parameter)

Is there anything that fits my use case?

1

1 Answers

2
votes

Would you like a unicorn and a machine that prints an unlimited number of $100 bills to go along with that? Har har har....

OK but seriously, you've got a tall order there. You're going to need a custom system that blends a few of those things together. For one, as you observe, there's really no such thing as a "graph/document" database.

As a general area of systems research, many people are looking into hybrid systems. An example would be that you maintain your graph structure in neo4j, and that the IDs of nodes in neo4j point to identifiers for documents in MongoDB. In this way, you'd have a graph/document database, but it would really be two databases. Such hybrid systems are rife with tradeoffs. For one, writing a query across both systems will be extremely difficult. For two, you'll introduce data dependencies across them, such that it might not be easy to update your graph structure without changing your documents, or vice versa.

For really intense performance requirements, hybrid systems are sometimes the only way to go. But just as a rule of thumb, for every 100 times you see someone say they need such a solution, probably 80 times they're better off with picking just one database and then living with the pros and the cons that it provides to them. Technology is ultimately about choices, pros, and cons, and learning to live with what you've picked. :)

To give you a succinct answer to the question you've asked, no there's nothing that does all of that. I'd recommend you work with an architect or consultant who can explore your requirements in depth, and make a recommendation on what architecture best suits most of your needs, balancing simplicity and cost. That's as much an art as a science.