7
votes

DFS with coloring would take O(V+E) vs union find would take O(ElogV) reference: http://www.geeksforgeeks.org/detect-cycle-undirected-graph/

So union find approach is slower. If V = 100, E = 100, DFS = 200, Union find is 1,000. Is there a reason to use Union find? I personally like it because it produces a clean code. Or anything I missed that union find is better in real practice?

2
That article doesn't mention that union-find that both takes rank into account and uses collapsing finds is for all practical purposes amortized constant time: O(n \alpha(n)) for n union find operations, where \alpha is the inverse ackerman function. This function applied to the number of atoms in the universe equals roughly 4, so you can think of it as constant.Gene

2 Answers

8
votes

I suspect that you may be misinterpreting how big-O notation works. The notation O(V + E) doesn't mean "the runtime is computed by adding V and E), but rather "the runtime scales as a function of the sum of V and E)." For example, suppose you run DFS on a graph with 1,000 nodes and 1,000 edges and the runtime is 1ms. It's then reasonable to assume that the runtime on a graph with 2,000 nodes and 2,000 edges would be something like 2ms. However, the big-O notation by itself won't tell you what the runtime will be on some given input if you don't have some reference point established. The 1ms figure I gave here is a total guess - you'd have to run the implementation to see what runtime you get.

Similarly, the runtime O(E log V) means "the runtime scales as the product of the number of nodes and the logarithm of the number of edges.) For example, if the runtime on an input with 1,000 nodes and 1,000 edges is 1ms, then the runtime on an input with 1,000 nodes and 2,000 edges would likely be 2ms, and the runtime on an input with 1,000,000 nodes and 1,000 edges would similarly be around 2ms. Again, the only way to find out what the runtime would be on some initial input would be to run it and see what happens.

One other detail - as many other folks have pointed out, the bound given on the union find data structure is for a very inefficient union-find structure. Using a disjoint set forest with path compression and union-by-rank, you could get an asymptotic runtime of O(α(n)) per operation, where α(n) is an extremely slowly-growing function (the Ackerman inverse function) that's essentially 5 for all inputs you could fit into the universe.

With that having been said - the asymptotic runtime of DFS is better than that for the union-find approach, so it's likely to be the faster one in practice. DFS is also relatively easy to implement, so I'd recommend going for that approach.

The advantage of the union-find structure is that it's good for the incremental version of the connectivity problem where edges are continuously being added in. DFS doesn't gracefully handle this case very well.

5
votes

Union-find with path compression and union by rank will have O(E*α(n)) complexity, where α(n) is an inverse Ackermann function. It's running time would be comparable to DFS, but personally, I would use DFS, it's simpler and more obvious way to get things done.

The only reason to prefer union-find I can think of is the situation when we have some unordered list/set of edges as a graph representation, and we can't or don't want to use extra time/memory to transform this data for DFS.