12
votes

Suppose that you have a directed graph with nonnegative, integer edge lengths that are in the range 0 to U - 1, inclusive. What is the fastest algorithm for computing a minimum spanning tree of this graph? We can still use existing minimum spanning tree algorithms, such as Kruskal's algorithm O(m log n)) or Prim's algorithm (O(m + n log n)). However, for cases where U is small, I think it should be possible to do much better this.

Are there any algorithms that are competitive with more traditional MST algorithms that are able to exploit the fact that the edge lengths are constrained to be in some range?

Thanks!

2
Are the lengths also restricted to be integers, or just restricted to that range?harold
@harold- They're integers. I'll post a correction.templatetypedef
Several sources mention there's a linear time algorithm for that, but link to something I can't view.harold
If the edge weights are restricted to integers, can you use a hash-table based priority queue, rather than a heap (or whatever comparison based O(log(n)) style structure). As usual with a hash queue all items with the same integer cost are placed in the same bucket, and you can get worst-case O(1) operations. I assume this would remove the O(log(n)) complexity from the algorithms you've mentioned, leaving something like O(m + n)...Darren Engwirda
@DarrenEngwirda is spot on here - if U is generally small, you can bucketize in linear time, and then add scan through adding an edge from the lowest weight bucket if and only if it connects a new vertex. Excellent (in terms of both improvement and simplicity) optimization.James

2 Answers

9
votes

Fredman–Willard gave an O(m + n) algorithm for integer edge lengths on the unit-cost RAM.

That's arguably not much of an improvement: without the restriction on edge lengths (i.e., the lengths are an opaque data type that supports only comparisons), Chazelle gave an O(m alpha(m, n) + n) algorithm (alpha is the inverse Ackermann function) and Karger–Klein–Tarjan gave a randomized O(m + n) algorithm.

I don't think Darren's idea leads to a O(m + n + U)-time algorithm. Jarnik ("Prim") does not use its priority queue monotonically, so buckets may be scanned multiple times; Kruskal requires a disjoint-set data structure, which cannot be O(m + n).

4
votes

With integer edge weights you can use bucketing to achieve a priority queue with worst-case O(1) complexity, but additional O(U) space complexity.

Within the MST algorithms you've mentioned you should be able to replace the comparison based priority queues with this integer structure, and hence remove the O(log(n)) depenedence in the complexity requirements. I expect you'd end up with an overall complexity in the style of O(n + m).

Essentially you setup a set of compressed linked-lists, where each list is indexed by the (integer!) cost associated with that bucket:

struct bucket_list
{
    _cost; // array[0..N-1] holding current cost for each item

    _head; // array[0..U-1] holding index of first item in each bucket

    _next; // array[0..N-1] where _next[i] is the next item 
           // in a list for the ith item

    _prev; // array[0..N-1] where _prev[i] is the last item 
           // in a list for the ith item
};

This structure is based on the fact that each item can only be in a single bucket list at once.

Based on this structure you can achieve worst-case O(1) complexity for these operations:

push(item, cost); // push an item onto the head of the appropriate bucket list

_pop(item, cost); // _pop an item from (anywhere!) within a bucket list

update(item, old_cost, new_cost); // move an item between buckets by combining
                                  // push and _pop

To use this structure as a priority queue you simply maintain an index pointing at the current minimum bucket to scan. When you want to get the next lowest cost item, you simply pop the head item from this bucket. If the bucket is empty you increment your bucket index until you find a non-empty one.

Of course if U becomes very large the extra space complexity, and the possiblity of a sparse distribution of items over the buckets may make this kind of approach unattractive.