One way to solve this is to rethink how your binary search tree containing the cumulative totals is built. Rather than building a binary search tree, think about having each node interpreted as follows:
- Each node stores a range of values that are dedicated to the node itself.
- Nodes in the left subtree represent sampling from the probability distribution just to the left of that range.
- Nodes in the right subtree represent sampling from the probability distribution just to the right of that range.
For example, suppose our weights are 3, 2, 2, 2, 2, 1, and 1 for events A, B, C, D, E, F, and G. We build this binary tree holding A, B, C, D, E, F, and G:
D
/ \
B F
/ \ / \
A C E G
Now, we annotate the tree with probabilities. Since A, C, E, and G are all leaves, we give each of them probability mass one:
D
/ \
B F
/ \ / \
A C E G
1 1 1 1
Now, look at the tree for B. B has weight 2 of being chosen, A has weight 3 of being chosen, and C has probability 2 of being chosen. If we normalize these to the range [0, 1), then A accounts for 3/7 of the probability and B and C each account for 2/7s. Thus we have the node for B say that anything in the range [0, 3/7) goes to the left subtree, anything in the range [3/7, 5/7) maps to B, and anything in the range [5/7, 1) maps to the right subtree:
D
/ \
B F
[0, 3/7) / \ [5/7, 1) / \
A C E G
1 1 1 1
Similarly, let's process F. E has weight 2 of being chosen while F and G each have probability weight 1 of being chosen. Thus the subtree for E accounts for 1/2 of the probability mass here, the node F accounts for 1/4, and the subtree for G accounts for 1/4. This means we can assign probabilities as
D
/ \
B F
[0, 3/7) / \ [5/7, 1) [0, 1/2) / \ [3/4, 1)
A C E G
1 1 1 1
Finally, let's look at the root. The combined weight of the left subtree is 3 + 2 + 2 = 7. The combined weight of the right subtree is 2 + 1 + 1 = 4. The weight of D itself is 2. Thus the left subtree has probability 7/13 of being picked, D has probability 2/13 of being picked, and the right subtree has probability 4/13 of being picked. We can thus finalized the probabilities as
D
[0, 7/13) / \ [9/13, 1)
B F
[0, 3/7) / \ [5/7, 1) [0, 1/2) / \ [3/4, 1)
A C E G
1 1 1 1
To generate a random value, you would repeat the following:
- Starting at the root:
- Choose a uniformly-random value in the range [0, 1).
- If it's in the range for the left subtree, descend into it.
- If it's in the range for the right subtree, descend into it.
- Otherwise, return the value corresponding to the current node.
The probabilities themselves can be determined recursively when the tree is built:
- The left and right probabilities are 0 for any leaf node.
- If an interior node itself has weight W, its left tree has total weight WL, and its right tree has total weight WR, then the left probability is (WL) / (W + WL + WR) and the right probability is (WR) / (W + WL + WR).
The reason that this reformulation is useful is that it gives us a way to update probabilities in O(log n) time per probability updated. In particular, let's think about what invariants are going to change if we update some particular node's weight. For simplicity, let's assume the node is a leaf for now. When we update the leaf node's weight, the probabilities are still correct for the leaf node, but they're incorrect for the node just above it, because the weight of one of that node's subtrees has changed. Thus we can (in O(1) time) recompute the probabilities for the parent node by just using the same formula as above. But then the parent of that node no longer has the correct values because one of its subtree weights has changed, so we can recompute the probability there as well. This process repeats all the way back up to the root of the tree, with us doing O(1) computation per level to rectify the weights assigned to each edge. Assuming that the tree is balanced, we therefore have to do O(log n) total work to update one probability. The logic is identical if the node isn't a leaf node; we just start somewhere in the tree.
In short, this gives
- O(n) time to construct the tree (using a bottom-up approach),
- O(log n) time to generate a random value, and
- O(log n) time to update any one value.
Hope this helps!
n/2
elements, that is not "better" thanO(n)
becauseO(n/2) = O(n)
. - Michael McGowan