Can I have a heap-like contiguous layout for complete trees based on a depth first order rather than breadth first?

Question

The heap is a classical data structure that puts a complete binary (or d-ary for the generalized version) tree into a contiguous array, storing the elements in breadth-first traversal order. In this way, all elements from the same level of the tree are stored contiguous one after the other.

I'm implementing a data structure which, under the hood, is a complete balanced tree of fixed degree d, and I want to store the tree in a contiguous form to free the space of node pointers. So I thought of putting the nodes into the breadth-first order used in heaps, but then I'm worried about cache performance of a typical search from the root down to a leaf, since at each level l, I jump over a lot of elements.

Is there a way to obtain compact contiguous representation of a d-ary complete tree, based on depth-first order instead?

This way, the nodes touched during a search of a leaf seem to me more likely to be found closer to each other. The problem then is how to retrieve the index of the parent and children of a node, but also I'm wondering which operations on the tree are in general efficient in this setting.

I'm implementing this thing in C++, in case it matters at all.

My first thought is that this will be cache-efficient only for tree traversals that primarily move towards the left. Whenever you move to the right child of a node, you must skip the indices occupied by all the nodes in the left subtree. — Aasmund Eldhuset
Yes, you're right. But as the depth increase, the distance decrease, so maybe there's a speedup anyway... Of course I am well happy of an answer that tells me exactly why this approach doesn't provides any benefit. — gigabytes
True, but cache lines are typically only 64 bytes wide, so you need to be on a very low level of the tree in order to stay inside one (and a big speedup on a small part of the search won't help much since most of the time will be spent on the slow part of the search). But it's an interesting question, and it would be interesting to see if anyone has done any research on this kind of trees. — Aasmund Eldhuset
Obviously left-child(i) = i + 1, and with a bit more thought, right-child(i) = i + d^(level(i) - MAX_LEVEL). How to get the current level from i I couldn't figure out in a few minutes. I suspect it involved a logarithm. — user395760
@delnan: I was stuck with the same step. Also I couldn't figure out if a node is a leaf or not. For leaves, although they have elements after them, i+1 is not the left child. — Neo M Hacker

Jim Mischel Jim Mischel · Accepted Answer · 2014-12-26T20:33:36

For simplicity, I'm going to limit my discussion to binary trees, but what I say holds true for n-ary trees, too.

The reason heaps (and trees in general) are stored in arrays breadth-first is because it's much easier to add and remove items that way: to grow and shrink the tree. If you're storing depth-first, then either the tree has to be allocated at its maximum expected size, or you have to do a lot of moving items around when you add levels.

But if you know that you're going to have a complete, balanced, n-ary tree, then the choice of BFS or DFS representation is largely a matter of style. There isn't any particular benefit to one over the other in terms of memory performance. In one representation (DFS) you take the cache misses up front, and in the other case (BFS) you take the cache misses at the end.

Consider a binary tree with 20 levels (i.e. 2^20 - 1 items) that contains the numbers from 0 to (2^20 - 1). Each node occupies four bytes (the size of an integer).

With BFS, you incur a cache miss when you get the first block of the tree. But then you have the first four levels of the tree in cache. So your next three queries are guaranteed to be in cache. After that, you're guaranteed to have a cache miss when the node index is greater than 15, because the left child is at x*2 + 1 which will be at least 16 positions (64 bytes) away from the parent.

With DFS, you incur a cache miss when you read the first block of the tree. As long as the number you're searching for is in the left subtree of the current node, you're guaranteed not to get a cache miss for the first 15 levels (i.e. you continually go left). But any branch that goes right will incur a cache miss until you get down to three levels above the leaves. At that point, the entire subtree will fit into the cache and your remaining queries incur no cache misses.

With BFS, the number of cache misses is directly proportional to the number of levels you have to search. With DFS, the number of cache misses is proportional to the path taken through the tree and the number of levels you have to search. But on average, the number of cache misses you incur when searching for an item will be the same for DFS as for BFS.

And the math for computing the node positions is easier for BFS than it is for DFS, especially when you want to find the parent of a particular node.

Can I have a heap-like contiguous layout for complete trees based on a depth first order rather than breadth first?

4 Answers