Why rank is not equal to height of the set in disjoint set?

Question

I have recently read about a data structure disjoint-set-union. I am confused about the rank heuristic. I have read that "rank is not height of the tree set".

I am unable to find this claim right. It is because when rank heuristic is used for union then easily we can see that rank is height of the tree set, but when we use path compression with it rank is no more height. But still, we are doing union considering the rank of the tree.

Suppose I am calling find(a1) and find(a2), it should change the rank of parent of a1 and a2, if height of parent of a1 lies in path from root to a1(same for a2).

How this path compression is working when rank is no more height of the tree? What does rank of the tree represents while using path-compression.

Below is code for find and union function. It is not a "complete program". But can be made into one easily.

int find(vector<int>& parent, int i) {  
    if (parent[i] != i) 
        parent[i] = find(parent, parent[i]); 

    return i; 
} 

void union(vector<int>& parent, vector<int>& rannk, int x, int y) { 
    int xroot = find(parent, x); 
    int yroot = find(parent, y);

    if (rank[xroot] < rank[yroot]) 
       parent[xroot] = yroot; 
    else if (rank[xroot] > rank[yroot]) 
        parent[yroot] = xroot; 
    else{ 
        parent[yroot] = xroot; 
        rank[xroot]++; 
    } 
}

int main() {
    vector<int> parent(10), rank(10, 1);

    for(int i = 0; i < 10; ++i)
       parent[i] = i;

    union(2, 5);
    union(1, 3);
    union(2, 7);
    union(2, 1);
    union(9, 8);
    union(0, 8);
    union(0, 1);
}

Above are just a few core functions for the sake of understanding.

parent[i] represents the parent of element i, 0<=i<=10 rank[i] represents rank of i(root) of a set, 0<=i<=10

As you can easily see, after each find() call the path is compressed from a node to its parent(now, it is directly connected to parent). Now rank is not the height of the tree set.

This code is using rank[] for determining the merging. What does rank[i] represent here and how does it help to decide to merge two sets in union()?

Matt Timmermans Matt Timmermans · Accepted Answer · 2019-01-28T03:02:27

The rank of a set doesn't represent anything except the rank. The near-constant amortized time performance of the disjoint-set structure using path compression and union-by-rank has been proven using rank, and so we continue to use union-by-rank so we can get that proven performance.

I'm pretty sure the same bound has been proven for using path compression with union-by-size, which is also easy if you want to do that.

Union-by-height is just not a viable option with path compression, since path compression makes keeping track of the height expensive. When you are not using path compression, then rank and height are the same, as you know.

Why rank is not equal to height of the set in disjoint set?

1 Answers