5
votes

I was wondering if anyone can point to a linear time algorithm to find the MST of a graph when there is a small number of weights (I.e edges can only have 2 different weights).

I could not find anything on google other than Prim's, Kruskal's, Boruvka's none of which seem to have any properties that would reduce the run time in this special case. I'm guessing to make it linear time it would have to be some sort of modification of BFS (which finds the MST when the weights are uniform).

2

2 Answers

4
votes

The cause of the lg V factor in Prim's O(V lg V) runtime is the heap that is used to find the next candidate edge. I'm pretty sure that it is possible to design a priority queue that does insertion and removal in constant time when there's a limited number of possible weights, which would reduce Prim to O(V).

For the priority queue, I believe it would suffice with an array whose indices covers all the possible weights, where each element points to a linked list that contains the elements with that weight. You'd still have a factor of d (the number of distinct weights) for figuring out which list to get the next element out of (the "lowest" non-empty one), but if d is a constant, then you'll be fine.

1
votes

Elaborating on Aasmund Eldhuset's answer: if the weights in the MST are restricted to numbers in the range 0, 1, 2, 3, ..., U-1, then you can adapt many of the existing algorithms to run in (near) linear time if U is a constant.

For example, let's take Kruskal's algorithm. The first step in Kruskal's algorithm is to sort the edges into ascending order of weight. You can do this in time O(m + U) if you use counting sort or time O(m lg U) if you use radix sort. If U is a constant, then both of these sorting steps take linear time. Consequently, the runtime for running Kruskal's algorithm in this case would be O(m α(m)), where α(m) is the inverse Ackermann function, because the limiting factor is going to be the runtime of maintaining the disjoint-set forest.

Alternatively, look at Prim's algorithm. You need to maintain a priority queue of the candidate distances to the nodes. If you know that all the edges are in the range [0, U), then you can do this in a super naive way by just storing an array of U buckets, one per possible priority. Inserting into the priority queue then just requires you to dump an item into the right bucket. You can do a decrease-key by evicting an element and moving it to a lower bucket. You can then do a find-min by scanning the buckets. This causes the algorithm runtime to be O(m + nU), which is linear if U is a constant.