I have been implementing an LLRB package that should be able to operate in either of the two modes, Bottom-Up 2-3 or Top-Down 2-3-4 described by Sedgewick (code - improved code, though dealing only with 2-3 trees here, thanks to RS for pointer).
Sedgewick provides a very clear description of tree operations for the 2-3 mode, although he spends a lot of time talking about the 2-3-4 mode. He also shows how a simple alteration of the order of color flipping during insertion can alter the behaviour of the tree (either split on the way down for 2-3-4 or split on the way up for 2-3):
private Node insert(Node h, Key key, Value value)
{
if (h == null)
return new Node(key, value);
// Include this for 2-3-4 trees
if (isRed(h.left) && isRed(h.right)) colorFlip(h);
int cmp = key.compareTo(h.key);
if (cmp == 0) h.val = value;
else if (cmp < 0) h.left = insert(h.left, key, value);
else h.right = insert(h.right, key, value);
if (isRed(h.right) && !isRed(h.left)) h = rotateLeft(h);
if (isRed(h.left) && isRed(h.left.left)) h = rotateRight(h);
// Include this for 2-3 trees
if (isRed(h.left) && isRed(h.right)) colorFlip(h);
return h;
}
However, he glosses over deletion in 2-3-4 LLRBs with the following:
The code on the next page is a full implementation of delete() for LLRB 2-3 trees. It is based on the reverse of the approach used for insert in top-down 2-3-4 trees: we perform rotations and color flips on the way down the search path to ensure that the search does not end on a 2-node, so that we can just delete the node at the bottom. We use the method fixUp() to share the code for the color flip and rotations following the recursive calls in the insert() code. With fixUp(), we can leave right-leaning red links and unbalanced 4-nodes along the search path, secure that these conditions will be fixed on the way up the tree. (The approach is also effective 2-3-4 trees, but requires an extra rotation when the right node off the search path is a 4-node.)
His delete() function:
private Node delete(Node h, Key key)
{
if (key.compareTo(h.key) < 0)
{
if (!isRed(h.left) && !isRed(h.left.left))
h = moveRedLeft(h);
h.left = delete(h.left, key);
}
else
{
if (isRed(h.left))
h = rotateRight(h);
if (key.compareTo(h.key) == 0 && (h.right == null))
return null;
if (!isRed(h.right) && !isRed(h.right.left))
h = moveRedRight(h);
if (key.compareTo(h.key) == 0)
{
h.val = get(h.right, min(h.right).key);
h.key = min(h.right).key;
h.right = deleteMin(h.right);
}
else h.right = delete(h.right, key);
}
return fixUp(h);
}
My implementation correctly maintains LLRB 2-3 invariants for all tree operations on 2-3 trees, but fails for a subclass of right-sided deletions on 2-3-4 trees (these failing deletions result in right leaning red nodes, but snowball to tree imbalance and finally null pointer dereferencing). From a survey of example code that discusses LLRB trees and includes options for construction of trees in either mode, it seems that none correctly implements the deletion from a 2-3-4 LLRB (i.e. none has the extra rotation alluded to, e.g. Sedgewick's java above and here).
I'm having trouble figuring out exactly what he means by "extra rotation when the right node off the search path is a 4-node"; presumably this is a rotate left, but where and when?
If I rotate left passing upwards past a 4-node equivalent (i.e. RR node) or a right leaning 3-node equivalent (BR node) either before calling fixUp() or at the end of the fixUp function I still get the same invariant contradiction.
Here are the tree states of the smallest failing examples I have found (generated by sequential insertion of elements from 0 to the respective maximum value).
The first pair of trees shows the transition from invariant-conforming state prior to deletion of element 15 to obviously broken state after.
The second is essentially the same as above, but with deletion of 16 of 0..16 (deletion of 15 results in the same topology). Note that the invariant contradiction manages to cross the root node.
The key is going to be understanding how to revert the violations generated during the walk down the tree to the target node. The following two trees show how the first tree above looks after a walk down the left and right respectively (without deletion and before walking back up with fixUp()).
After attempt to delete '-1' without fixUp:
After attempt to delete '16' without fixUp:
Trying rotate left on the walk back up when the node has only a red right child seems to be part of the solution, but it does not deal correctly with two red right children in a row, preceding this with a flipColor when both children are red seems to improve the situation further, but still leaves some invariants violated.
If I further check whether the right child of a right child is red when its sibling is black and rotate left if this is true I only fail once, but at this point I feel like I'm needing a new theory rather than a new epicycle.
Any ideas?
For reference, my implementation is available here (No, it's not Java).
Follow-up:
Part of the reason I was interested in this was to confirm the claim by many that 2-3 LLRB trees are more efficient than 2-3-4 LLRB trees. My benchmarking has confirmed this for insertion and deletion (2-3 are about 9% faster), but I find that retrieval is very slightly faster for 2-3-4 trees.
The following times are representative and consistent across runs:
BU23:
BenchmarkInsert 1000000 1546 ns/op
BenchmarkDelete 1000000 1974 ns/op
BenchmarkGet 5000000 770 ns/op
TD234:
BenchmarkInsert 1000000 1689 ns/op
BenchmarkDelete 1000000 2133 ns/op
BenchmarkGet 5000000 753 ns/op
First column is bench name, second is number of operations, third is result. Benchmark on i5M 2.27.
I have had a look at branch lengths for 2-3 tree and 2-3-4 trees and there is little in that to explain the retrieval difference (mean distance from root to node and S.D. of 1000 trees each with 10000 random inserts):
Means:
TD234 leafs BU23 leafs
12.88940 12.84681
TD234 all BU23 all
11.79274 11.79163
StdDev:
TD234 leafs BU23 leafs
1.222458 1.257344
TD234 all BU23 all
1.874335 1.885204