Good Overviews
Generally speaking, you're making a decision between fast read times (for example, nested set) or fast write times (adjacency list). Usually, you end up with a combination of the options below that best fit your needs. The following provides some in-depth reading:
- One more Nested Intervals vs. Adjacency List comparison: the best comparison of Adjacency List, Materialized Path, Nested Set and Nested Interval I've found.
- Models for hierarchical data: slides with good explanations of tradeoffs and example usage
- Representing hierarchies in MySQL: very good overview of Nested Set in particular
- Hierarchical data in RDBMSs: most comprehensive and well-organized set of links I've seen, but not much in the way of explanation
Options
Ones I am aware of and general features:
- Columns: ID, ParentID
- Easy to implement.
- Cheap node moves, inserts, and deletes.
- Expensive to find the level, ancestry & descendants, path
- Avoid N+1 via Common Table Expressions in databases that support them
- Columns: Left, Right
- Cheap ancestry, descendants
- Very expensive
O(n/2)
moves, inserts, deletes due to volatile encoding
- Bridge Table (a.k.a. Closure Table /w triggers)
- Uses separate join table with: ancestor, descendant, depth (optional)
- Cheap ancestry and descendants
- Writes costs
O(log n)
(size of subtree) for insert, updates, deletes - Normalized encoding: good for RDBMS statistics & query planner in joins
- Requires multiple rows per node
- Lineage Column (a.k.a. Materialized Path, Path Enumeration)
- Column: lineage (e.g. /parent/child/grandchild/etc...)
- Cheap descendants via prefix query (e.g.
LEFT(lineage, #) = '/enumerated/path'
) - Writes costs
O(log n)
(size of subtree) for insert, updates, deletes - Non-relational: relies on Array datatype or serialized string format
- Like nested set, but with real/float/decimal so that the encoding isn't volatile (inexpensive move/insert/delete)
- Has real/float/decimal representation/precision issues
- Matrix encoding variant adds ancestor encoding (materialized path) for "free", but with added trickiness of linear algebra.
- A modified Adjacency List that adds a Level and Rank (e.g. ordering) column to each record.
- Cheap to iterate/paginate over
- Expensive move and delete
- Good Use: threaded discussion - forums / blog comments
- Columns: one for each lineage level, refers to all the parents up to the root, levels down from the item's level are set to NULL
- Cheap ancestors, descendants, level
- Cheap insert, delete, move of the leaves
- Expensive insert, delete, move of the internal nodes
- Hard limit to how deep the hierarchy can be
Database Specific Notes
MySQL
Oracle
- Use CONNECT BY to traverse Adjacency Lists
PostgreSQL
- ltree datatype for Materialized Path
SQL Server
- General summary
- 2008 offers HierarchyId data type appears to help with Lineage Column approach and expand the depth that can be represented.
Closure Tables
are superior toAdjacency List
,Path Enumeration
andNested Sets
in terms of ease of use (and I'm guessing performance as well). – Gili