In my day-to-day work I'm often working with datasets that contain millions of rows, sometimes 100s of millions, and occasionally over 1 billion. These datasets often need to be sorted. The keys are almost always large integer values (usually 9 digits). Sometimes the datasets have composite keys of a 9-digits and 3-digits.
I was wondering if it would be possible to implement a (LSD-first) radix-sort macro in SAS that could be used instead of PROC SORT
to reduce time spent sorting these datasets. I've already tuned the sorts to use compression where appropriate, keep only the relevant fields (or use tagsort), size field length's appropriately, don't sort unnecessarily etc etc...
The hardware I'm using has limitations - let's assume that I only have 2G of memory available to SAS so the solution can't require putting all of the key values in a temporary array in memory (at least not all at one time).
Would the solution offer a performance improvement over proc sort? Has anyone already implemented something like this or had experience with it? Am I wasting my time?