I have been working on a radix sort implementation (code so far pasted below). Code is in Java but should work just as well in C/C++. As you can see from the implementation I am doing it most significant bit first, the 31st bit for integers. This would seem to be faster because once a subgroup is complete, it no longer needs to be iterated.
For example, as an analogy, imagine sorting words and you only have one word that begins with "A". Once you see the A and place the word at the beginning of the list, you no longer have to examine any further characters in the word. On the other hand, if you begin from the end of the word, you would have to look at every letter before you figured out it belonged at the beginning of the list.
So, based on this idea, I would think MSB order would be fastest, but am I missing anything? Is LSB just as fast for some reason? I know that LSB performs "stable sort", but I don't see any practical benefit to this.
public static final int[] RadixSort_unsigned_1( int[] values1 ){ // one based key sorting
int[] keys = new int[ values1.length ];
int ctValues = values1[0];
keys[0] = ctValues;
for( int xKey = 1; xKey <= ctValues; xKey++ ) keys[xKey] = xKey;
int iFrameListSize = (int)Math.sqrt( (double)ctValues ) + 2;
int[] nextBottom = new int[ iFrameListSize ];
int[] nextTop = new int[ iFrameListSize ];
int ctFramesRemaining = 1;
int ctFramesInNextRadix = 0;
nextBottom[ 1 ] = 1; // the frame information is maintained in a circular queue
nextTop[ 1 ] = ctValues;
int xFrame = 1;
int xFrameNextRadix = 2;
int radix = 32;
while( radix > 0 ){
while( ctFramesRemaining > 0 ){ // go through all the frames and sort them
int xLow = nextBottom[ xFrame ];
int xHigh = nextTop[ xFrame ];
while( true ){ // sort frame
while( values1[ keys[ xLow ] ] == 0 ) xLow++;
while( values1[ keys[ xHigh ] ] == 1 ) xHigh--;
if( xLow > xHigh ) break;
int iLowKey = keys[xLow]; // exchange high and low
keys[xLow] = keys[xHigh];
keys[xHigh] = iLowKey;
}
if( xHigh > nextBottom[ xFrame ] ){ // add a lower frame
if( xLow < nextTop[ xFrame ] ){ // and also add an upper frame
xFrameNextRadix++;
nextBottom[ xFrameNextRadix ] = nextBottom[ xFrame ]; // bottom remains the same
nextTop[ xFrameNextRadix ] = xHigh;
xFrameNextRadix++;
nextBottom[ xFrameNextRadix ] = xLow;
nextTop[ xFrameNextRadix ] = nextTop[ xFrame ]; // top remains the same
ctFramesInNextRadix += 2;
} else { // just add the lower frame
xFrameNextRadix++;
nextBottom[ xFrameNextRadix ] = nextBottom[ xFrame ]; // bottom remains the same
nextTop[ xFrameNextRadix ] = xHigh;
ctFramesInNextRadix++;
}
} else if( xLow < nextTop[ xFrame ] ){ // just add the upper frame
xFrameNextRadix++;
nextBottom[ xFrameNextRadix ] = xLow;
nextTop[ xFrameNextRadix ] = nextTop[ xFrame ]; // top remains the same
ctFramesInNextRadix++;
} // otherwise add no new frames
ctFramesRemaining--;
}
if( ctFramesInNextRadix == 0 ) break; // done
radix--;
}
return keys;
}
Note that in this implementation, a "radix" is a binary radix, i.e., a bit.
UPDATE
BTW In Java, this runs 5 times faster than the built in Arrays.sort (when I do an in-place sort, not a keyed sort), which is pretty cool.