17
votes

The heap is a classical data structure that puts a complete binary (or d-ary for the generalized version) tree into a contiguous array, storing the elements in breadth-first traversal order. In this way, all elements from the same level of the tree are stored contiguous one after the other.

I'm implementing a data structure which, under the hood, is a complete balanced tree of fixed degree d, and I want to store the tree in a contiguous form to free the space of node pointers. So I thought of putting the nodes into the breadth-first order used in heaps, but then I'm worried about cache performance of a typical search from the root down to a leaf, since at each level l, I jump over a lot of elements.

Is there a way to obtain compact contiguous representation of a d-ary complete tree, based on depth-first order instead?

This way, the nodes touched during a search of a leaf seem to me more likely to be found closer to each other. The problem then is how to retrieve the index of the parent and children of a node, but also I'm wondering which operations on the tree are in general efficient in this setting.

I'm implementing this thing in C++, in case it matters at all.

4
My first thought is that this will be cache-efficient only for tree traversals that primarily move towards the left. Whenever you move to the right child of a node, you must skip the indices occupied by all the nodes in the left subtree.Aasmund Eldhuset
Yes, you're right. But as the depth increase, the distance decrease, so maybe there's a speedup anyway... Of course I am well happy of an answer that tells me exactly why this approach doesn't provides any benefit.gigabytes
True, but cache lines are typically only 64 bytes wide, so you need to be on a very low level of the tree in order to stay inside one (and a big speedup on a small part of the search won't help much since most of the time will be spent on the slow part of the search). But it's an interesting question, and it would be interesting to see if anyone has done any research on this kind of trees.Aasmund Eldhuset
Obviously left-child(i) = i + 1, and with a bit more thought, right-child(i) = i + d^(level(i) - MAX_LEVEL). How to get the current level from i I couldn't figure out in a few minutes. I suspect it involved a logarithm.user395760
@delnan: I was stuck with the same step. Also I couldn't figure out if a node is a leaf or not. For leaves, although they have elements after them, i+1 is not the left child.Neo M Hacker

4 Answers

8
votes

For simplicity, I'm going to limit my discussion to binary trees, but what I say holds true for n-ary trees, too.

The reason heaps (and trees in general) are stored in arrays breadth-first is because it's much easier to add and remove items that way: to grow and shrink the tree. If you're storing depth-first, then either the tree has to be allocated at its maximum expected size, or you have to do a lot of moving items around when you add levels.

But if you know that you're going to have a complete, balanced, n-ary tree, then the choice of BFS or DFS representation is largely a matter of style. There isn't any particular benefit to one over the other in terms of memory performance. In one representation (DFS) you take the cache misses up front, and in the other case (BFS) you take the cache misses at the end.

Consider a binary tree with 20 levels (i.e. 2^20 - 1 items) that contains the numbers from 0 to (2^20 - 1). Each node occupies four bytes (the size of an integer).

With BFS, you incur a cache miss when you get the first block of the tree. But then you have the first four levels of the tree in cache. So your next three queries are guaranteed to be in cache. After that, you're guaranteed to have a cache miss when the node index is greater than 15, because the left child is at x*2 + 1 which will be at least 16 positions (64 bytes) away from the parent.

With DFS, you incur a cache miss when you read the first block of the tree. As long as the number you're searching for is in the left subtree of the current node, you're guaranteed not to get a cache miss for the first 15 levels (i.e. you continually go left). But any branch that goes right will incur a cache miss until you get down to three levels above the leaves. At that point, the entire subtree will fit into the cache and your remaining queries incur no cache misses.

With BFS, the number of cache misses is directly proportional to the number of levels you have to search. With DFS, the number of cache misses is proportional to the path taken through the tree and the number of levels you have to search. But on average, the number of cache misses you incur when searching for an item will be the same for DFS as for BFS.

And the math for computing the node positions is easier for BFS than it is for DFS, especially when you want to find the parent of a particular node.

1
votes

It would appear that an is_leaf indicator is needed. Because most everything is related by the level, we need a quick way of finding it which seems to depend on knowing if the node is a leaf or not.

The snippets below also assume the the position of the node relative to the parent is known... it's not pretty and pretty much useless since the whole point is to save space.

int parent_index(int index, int pos){
  if (pos == LEFT){
    return i-1;
  } else {
    return i - pow(2,num_levels - level(i));
  }
}

int left_child_index(int index){
  return i+1;
}
int right_child_index(int index){
  return i+pow(2,num_levels - level(index))
}

To get the level of a node, you could walk the left children until you get to a leaf.

The differences between tree indecies appears to resemble something similar to Pascal's triangle - so that may be helpful too.

1
votes

I just had a thought.

How about infix order? That way everything is much easier to calculate:

bool is_leaf(unsigned int i){
  return !(i%2);
}

unsigned int left_child(unsigned int i){
  return i - pow(2,num_levels - level(i));
}
unsigned int left_child(unsigned int i){
  return i + pow(2,num_levels - level(i));
}

int level(unsigned int i){
  unsigned int offset = 1;
  unsigned int level_bits = 1;
  int level = 0;
  while ((i - offset)&level_bits == 0){
    level++;
    offset += pow(2,level);
    level_bits = (level_bits << 1) + 1; /* should be a string of trailing 1s */
  }
  return level;
}

This way you should get big jumps only at the top most nodes. After that the jumps get smaller exponentially. The beauty of this is that because there are fewer nodes at low levels, you could potentially cache those. Where the tree is much more dense (ie, more comparasons) the jumps are a lot smaller.

The draw back is inserts are slow:

void insert_node(node[] node_array, node new_node){
  for (unsigned int i = num_nodes-1; i >= 0; i--){
    node_array[i*2 + 1] = node_array[i];
    node_array[i] = NULL_NODE_VALUE; /* complete (but not full) trees are discontiguous */
  }
  node_arry[0] = new_node;
}

This infix order is no doubt a lot better the prefix (depth first search) order since the tree is both logically and physically 'balanced'. In prefix order, the left side is favoured a lot more - so it would have behave like an unbalanced tree anyhow. At least with infix, you get balanced and quick search amongst dense nodes.

1
votes

Binary search trees are used to store information which can later be queried and sorted efficiently. Left node of any particular node contains value which is smaller than that node's value and right node containing greater value.

Heap is efficient implementation of almost complete binary search trees?

Binary search trees need atleast two other pointers (as their can be parent pointer also) besides data value represented by a particular node. Heap based structures converts these pointer manipulation into array index manipulation by utilizing property of almost complete BST. We also know that if particular BST is not close to almost complete BST, we create holes in array representation of that binary tree so as to maintain the relationship between parent and child nodes. That means this could nullify the cost of using pointers in such cases.

Implement heap like structure based on depth first order of tree traversal?

By trying to implement heap like structure for depth first order traversal of tree we are no longer able to justify the reason for using heap at first place. Since, depth is not fixed in contrast to breadth of tree ( which can be calculated at particular level given tree is almost complete BST ), we have to manipulate complex relationship between elements. And whenever there is new element added to/deleted from the tree, we also have to rearrange the elements so that they still satisfy the property of heap. So, i don't think we would be able to justify the use of heap over BST if implemented this way.