5
votes

I have been reading about union-find problem. The two main improvements are path compression and union by rank. As far as I understand union by rank is used to determine how to combine disjoint trees. If we have two disjoint trees T1 and T2, then we attach the root of the tree with smaller rank to the tree with higher rank. If we don't use path compression then rank is just the depth of a tree. This makes sense since we don't want to increase the depth of out tree,since it directly affects both finds and unions.

My problem is when we use path compression as well. I keep reading that the two optimizations complement each other, but I don't see that. Because of path compression, rank is no longer depth of the tree (it becomes an upper bound on depth). Suppose T1 has 2 branches (let rank of T1 be 3), and T2 has depth 2 and rank 2. Now suppose we perform find operation (with path compression) on the leaf of T1 marked with "*" below. Now if we union root of T1 and root of T2, then T2 would be attached to the root of T1(since rank is not updated by find). The resulting tree has depth 3. But we could have had a better performance if we attached T1 to T2.

T1:   o   (Rank = 3)    T2:   o  (Rank = 2)
     / \                      |
    o   o                     o
    |                         |
    o                         o
    |   
    *   

After a find on leaf of T1("*"), then union on roots of T1 and T2 we get

T1:    o       (Rank = 3)      T2: o  (Rank = 2)       
     /| |\                         |
    * o o o                        o
                                   |
                                   o
Result of T1 union T2
       o
   / | | |\
  * o o o  o   Rank = 3 and Max Depth = 3
           |
           o
           |
           o

Am i missing something here? How do path compression and union by rank complement each other? I know rank is just an upper bound on depth of a tree but I don't see how union by rank is improving the overall performance of the structure. How is this better than a union that combines the roots randomly?

Thanks for the help in advance.

2
Why do you say "we could have had better performance if we attached T1 to T2"? If the union were followed by, say, a find on every node, then that would result in a lot more work.Matt Timmermans
My understanding of better performance is that if the tree is deeper then finds and unions take more time to locate their top representative (the root) - subsequent find and unions operations will be faster due to path compression. But that was just the work of path compression. I am just failing to understand how union by rank improves performance, because rank doesn't exactly capture depth of a tree. I guess my question could be better - what do I lose if I just implement this with just path compression and why?Hermon

2 Answers

5
votes

Union by rank ensures that the maximum depth of the tree is log N, so it puts a worst case upper bound of O(log N) on every operation.

Path compression without any special union rules an upper bound of O(log N) on the amortized cost of each operation, but doesn't limit the worst case cost. (There might even be a tighter bound on the amortized cost, but O(log N) is the one I know how to prove)

Using both together, you get the O(log N) limit on the worst case and the amortized bound improves to O(𝛼(N)), which is effectively constant. In this way the two optimizations are complementary.

You are right that there are sequences of operations for which union by rank is not absolutely optimal, but the guarantees are better than what you get without it, and that is what counts. We don't usually spend effort optimizing the best case performance. We optimize the worst or average cases.

0
votes

The graph T2 in the question, cannot be formed on the first place if you are using union by rank/weight.

Just try for yourself with 3 nodes and see if you can come up with graph T2. It's not possible. Even graph T1 cannot be formed in the first place.

You can start with an example of N nodes and if you do union by rank/weight from the beginning, you will actually see that it will give an optimal merge structure( some sequences of operations might not give the best merge, but majority of the times it gives the best results).

In other words union by rank/weight is helping the find method(which in-turn uses path compression for further optimisation) which make find operation almost constant time operation.