0
votes

How would you go about parallelizing the radix-sort algorithm in C with OpenMP?

My program is a modification of your typical radix-sort: It sorts an array of integers based on the binary representation of the digit, where you're able to vary the number of bits that should be interpreted as one digit (which essentially will be used to get different running time based on how big your integers are).

I have a radix-function that takes three arguments:

// n is the number of elements in data
// b is number of bits that should be interpreted as one digit
void radix(int* data, int n, int b);

Further on, my radix-function iterates through all the bits (int: 32 bits) with b increments:

for(bit = 0; bit < 32; bit += b) { ... }

Which contains three parts:

  • Counting occurences of a certain digit (actually bits), to decide how much storage a bucket needs. bucket[(data[i] >> bit) & (int)(pow(2,b)-1)]++
  • Putting values in a temporary array (the buckets).

    bitval = (data[i] >> bit) & (int)(pow(2,b)-1)

    temp_data[bucket[bitval]++] = data[i]

  • Copying values from the temporary buckets to the *data pointer given to the function.

    for(i = 0; i < n; i++) { data[i] = temp_data[i] }

1

1 Answers

0
votes

Parallelization is going to be an issue because the limiting factor will be memory bandwidth (there is very little CPU overhead, and only one memory bus).

Also instead of using the floating point function pow(2,b), create a bit mask and right shift count based on b:

    numberOfBits = b;
    shiftCount = 0;
    while(1){  // main loop
        // set numberOfBuckets
        numberOfBuckets = 1 << numberOfBits;
        bitMask = numberOfBuckets - 1;
        // code to generate histogram for this field goes here
        // ...
        shiftCount += numberOfBits;
        // check for partial bit field
        if((shiftCount + numberOfBits) > (8*sizeof(unsigned int))){
            numberOfBits = (8*sizeof(unsigned int)) - shiftCount;
            shiftCount = (8*sizeof(unsigned int)) - numberOfBits;
            continue; // do partial bit field
        }
        // check for done
        if(shiftCount == (8*sizeof(unsigned int)))
            break; // done
    }

If sorting signed integers, you'll need to adjust for the most significant field (also arithmetic right shift for signed integers is compiler / platform dependent). One solution (for two's complement signed integers) is to cast to unsigned integer and complement the sign bit for bucket index generation.