Here's a breakdown on the union/find algorithm for disjoint set forests on wikipedia:
- Barebone disjoint-set forests... (
O(n)
)- ... with union by rank ... (now improved to
O(log(n)
)- ... with path compression (now improved to
O(a(n))
, effectivelyO(1)
)
- ... with path compression (now improved to
- ... with union by rank ... (now improved to
Implementing union by rank necessitates that each node keeps a rank
field for comparison purposes. My question is, is union by rank worth this additional space? What happens if I skip union by rank and just do path compression instead? Is it good enough? What is the amortized complexity now?
A comment is made that implies that union by rank without path compression (amortized O(log(n)
complexity) is sufficient for most practical application. This is correct. What I'm asking is the other way around: what if you skip union by rank and ONLY do path compression instead?
In a sense, path compression is an extra step to improve union by rank, and that's why that extra step can be omitted without disastrous consequence. But is union by rank a necessary intermediate step to path compression? Can I skip it and go straight to path compression, or will that be catastrophic?
It was also pointed out that without union by rank, repeated unions could create a linked-list like structure. This means that a single path compression operation could take O(n)
in the worst case. This would of course affect future operations, so how this plays out when amortized over many operations is what I'm more interested in.
O(n)
time, that worst-case path will become compressed because of that operation, so it is not the case that the same worst case time can occur repeatedly - at least not on the same path or any part of it. Therefore the analysis for worst-case repeated operation might be different from the worst-case single operation, I guess? – rwong