Radix sort most significant first or least significant, which is faster?

Question

I have been working on a radix sort implementation (code so far pasted below). Code is in Java but should work just as well in C/C++. As you can see from the implementation I am doing it most significant bit first, the 31st bit for integers. This would seem to be faster because once a subgroup is complete, it no longer needs to be iterated.

For example, as an analogy, imagine sorting words and you only have one word that begins with "A". Once you see the A and place the word at the beginning of the list, you no longer have to examine any further characters in the word. On the other hand, if you begin from the end of the word, you would have to look at every letter before you figured out it belonged at the beginning of the list.

So, based on this idea, I would think MSB order would be fastest, but am I missing anything? Is LSB just as fast for some reason? I know that LSB performs "stable sort", but I don't see any practical benefit to this.

public static final int[] RadixSort_unsigned_1( int[] values1 ){ // one based key sorting
    int[] keys = new int[ values1.length ];
    int ctValues = values1[0];
    keys[0] = ctValues;
    for( int xKey = 1; xKey <= ctValues; xKey++ ) keys[xKey] = xKey;
    int iFrameListSize = (int)Math.sqrt( (double)ctValues ) + 2;
    int[] nextBottom = new int[ iFrameListSize ];
    int[] nextTop = new int[ iFrameListSize ];
    int ctFramesRemaining = 1;
    int ctFramesInNextRadix = 0;
    nextBottom[ 1 ] = 1; // the frame information is maintained in a circular queue
    nextTop[ 1 ] = ctValues;
    int xFrame = 1;
    int xFrameNextRadix = 2;
    int radix = 32;
    while( radix > 0 ){
        while( ctFramesRemaining > 0 ){ // go through all the frames and sort them
            int xLow = nextBottom[ xFrame ];
            int xHigh = nextTop[ xFrame ];
            while( true ){ // sort frame
                while( values1[ keys[ xLow ] ] == 0 ) xLow++;
                while( values1[ keys[ xHigh ] ] == 1 ) xHigh--;
                if( xLow > xHigh ) break;
                int iLowKey = keys[xLow]; // exchange high and low
                keys[xLow] = keys[xHigh];
                keys[xHigh] = iLowKey;
            }
            if( xHigh > nextBottom[ xFrame ] ){ // add a lower frame
                if( xLow < nextTop[ xFrame ] ){ // and also add an upper frame
                    xFrameNextRadix++;
                    nextBottom[ xFrameNextRadix ] = nextBottom[ xFrame ]; // bottom remains the same
                    nextTop[ xFrameNextRadix ] = xHigh;
                    xFrameNextRadix++;
                    nextBottom[ xFrameNextRadix ] = xLow;
                    nextTop[ xFrameNextRadix ] = nextTop[ xFrame ]; // top remains the same
                    ctFramesInNextRadix += 2;
                } else { // just add the lower frame
                    xFrameNextRadix++;
                    nextBottom[ xFrameNextRadix ] = nextBottom[ xFrame ]; // bottom remains the same
                    nextTop[ xFrameNextRadix ] = xHigh;
                    ctFramesInNextRadix++;
                }
            } else if( xLow < nextTop[ xFrame ] ){ // just add the upper frame
                xFrameNextRadix++;
                nextBottom[ xFrameNextRadix ] = xLow;
                nextTop[ xFrameNextRadix ] = nextTop[ xFrame ]; // top remains the same
                ctFramesInNextRadix++;
            } // otherwise add no new frames
            ctFramesRemaining--;
        }
        if( ctFramesInNextRadix == 0 ) break; // done
        radix--;
    }
    return keys;
}

Note that in this implementation, a "radix" is a binary radix, i.e., a bit.

UPDATE

BTW In Java, this runs 5 times faster than the built in Arrays.sort (when I do an in-place sort, not a keyed sort), which is pretty cool.

@cHao the timing varies a lot, even for large arrays; for example, for an array with 10 million elements it can take anywhere from 120 to 160 milliseconds to sort it — Tyler Durden

Niklas B. Niklas B. · Accepted Answer · 2014-03-08T22:17:46

So, based on this idea, I would think MSB order would be fastest, but am I missing anything?

From my experience, recursive MSD radix sort is indeed faster than a LSD radix sort implementation. The reason for this however is not mainly the one you mentioned (which is valid, but not very relevant in practice), but a combination of these two:

cache efficiency: MSD lends itself to a recursive implementation. If the digits of the sorted objects (numbers, strings, ...) are reasonably randomly distributed, from a certain recursion depth onwards the whole subproblem fits into the faster CPU caches. Reducing the number of cache misses is in my experience the most important constant optimization you can apply when designing an algorithm, because the main memory is really slow compared to a typical CPU
Beneath a certain problem size you can use insertion sort. If the sorted objects are small enough (for example, if you sort integers) and if the whole subarray fits into cache, insertion sort might well be faster than any other sorting algorithm out there.

Your implementation is not recursive, so it might not have these advantages, depending on the order in which it solves the subproblems (I haven't really analysed the algorithm, but if you use a queue instead of a stack, you probably don't have very good cache locality).

I know that LSB performs "stable sort", but I don't see any practical benefit to this.

There are several applications where you need the stability property. One that comes to my mind is suffix array construction. I've written about a simple O(n log n) algorithm to do this as an answer to another SO question. It uses radix sort and requires that the sorting is stable. There are in fact stable variations of MSD radix sort, but they require additional memory. I don't know how they compare to LSD approaches.

Radix sort most significant first or least significant, which is faster?

1 Answers