73
votes

What type of NoSQL database is best suited to store hierarchical data?

Say for example I want to store posts of a forum with a tree structure:

original post
 + re: original post
 + re: original post
   + re2: original post
     + re3: original post
   + re2: original post
13
I have an analogous problem in my data model. Neo4j works nicely but won't scale horizontally. I thought MongoDB would be better but since you can't retrieve embedded "original post" elements without knowing the schema from the top level, it is actually inferior to a graph database.Sridhar Sarnobat
@Sridhar-Sarnobat Maybe the future belongs to hybrid databases like OrientDB or ArrangoDB which combine document and graph databases. Even PostgreSQL supports JSON documents these days.deamon
Thanks for the suggestion. I'll take a closer look at thoseSridhar Sarnobat
I've worked with Neo4j and OrientDB in the past year, both offer better solutions for the type of problem described here than Mongo or Couch. Where the problem really is traversing a graph.orangepips

13 Answers

33
votes

MongoDB and CouchDB offer solutions, but not built in functionality. See this SO question on representing hierarchy in a relational database as most other NoSQL solutions I've seen are similar in this regard; where you have to write your own algorithms for recalculating that information as nodes are added, deleted and moved. Generally speaking you're making a decision between fast read times (e.g. nested set) or fast write times (adjacency list). See aforementioned SO question for more options along these lines - the flat table approach appears most aligned with your question.

One standard that does abstract away these considerations is the Java Content Repository (JCR), both Apache JackRabbit and JBoss eXo are implementations. Note, behind the scenes both are still doing some sort of algorithmic calculations to maintain hierarchy as described above. In addition, the JCR also handles permissions, file storage, and several other aspects - so it may be overkill for your project.

19
votes

This is graph database. Can be used as tree database.

http://neo4j.com/

18
votes

What you possibly need is a document-oriented database like MongoDB or CouchDB.

See examples of different techniques which allow you to store hierarchical data in MongoDB: http://www.mongodb.org/display/DOCS/Trees+in+MongoDB

3
votes

Faced with the same issue, I decided to create my own (very simple) solution using Lua + Redis https://github.com/qbolec/Redis-Tree/

2
votes

Exist-db implemented hierarchical data model for xml persistence

2
votes

Graph databases would probably also solve this problem. If neo4j is not enough for you in terms of scaling, consider Titan, which is based on various storage back-ends including HBase and should scale very well. It is not as mature as neo4j, but it is a very promising project.

2
votes

LDAP, obviously. OpenLDAP would make short work of it.

2
votes

In mathematics, and, more specifically, in graph theory, a tree is an undirected graph in which any two vertices are connected by exactly one path. So any graph db will do the job for sure. BTW an ordinary graph like a tree can be simply mapped to any relational or non-relational DB. To store hierarchical data into a relational db take a look at this awesome presentation by Bill Karwin. There are also ORMs with facilities to store trees. For example TypeORM supports the Adjacency list and Closure table patterns for storing hierarchical structures.

TypeORM is used in TypeScript\Javascript development. Check popular ORMs to find a one supporting trees based on your environment.

The king of Non-relational DBs [IMHO] is Mongodb. Check out it's documentation. to find out how it stores trees. Trees are the most common kind of graphs and they are used everywhere. Any well-established DB solution should have a way to deal with trees.

0
votes

Here's a non-answer for you. SQLServer 2008!!!! It's great for recursive queries. Or you can go the old fashioned route and store hierarchy data in a separate table to avoid recursion.

I think relational databases lend themselves very well to tree data. Both in query performance and ease of use. With one caveat.... you will be inserting into an indexed table, and probably several other indexed tables every time someone makes a post. Insert performance could be an issue on a facebook caliber forum.

0
votes

Check out MarkLogic. You can download a demo copy from the website. It is a database for unstructured data and falls under the NoSQL classification of databases. I know unstructured data is a pretty loaded term but just think of it as data that does not fit well in the rows and columns of a RDBMS (like hierarchical data).

0
votes

Just spent the weekend at a training course using MUMUPS db as a back-end for a full stack javascript browser application development framework. Great stuff! I'd recommend GT.M distro of MUMPS under GPL. Or try http://sourceforge.net/projects/mumps/?source=recommended for vanilla MUMPS. Check out http://robtweed.wordpress.com/ for ewd.js js framework and more info on MUMPS.

0
votes

A NoSql storage service with native support for hierarchical data is Amazon Web Service's Simple Storage Service (AWS S3). The path based keys are hierarchical by nature, and the blob values may be typed using attributes (mime type, e.g. application/json, text/csv, etc.). Advantages of S3 include the ability to scale to both extremely large overall capacity, versioning, as well as nearly infinite concurrent writes. Disadvantages include no support for conditional writes (optimistic concurrency), or consistent reads (only for read-after write) and no support for references/relationships. It is also purely usage based so wide variations in demand do not require complex scaling infrastructure or over-provisioned capacity.