6
votes

I'm trying to learn parallel programming with OpenMP and I'm interested in parallelizing the following do while loop with several while loop inside it:

do {
        while(left < (length - 1) && data[left] <= pivot) left++;
        while(right > 0 && data[right] >= pivot) right--;

        /* swap elements */
        if(left < right){
            temp = data[left];
            data[left] = data[right];
            data[right] = temp;
        }

    } while(left < right);

I haven't actually figured out how to parallelize while and do while loops, couldn't find any resource where it specifically describes how to parallelize while and do while loops. I have found instructions for for loops, but I couldn't make any assumption for while and do while loops from that. So, could you please describe how I can parallelize this loops that I provided here?

EDIT

I have transformed the do while loop to the following code where only for loop is used.

for(i = 1; i<length-1; i++)
{
    if(data[left] > pivot)
    {
        i = length;
    }
    else
    {
        left = i;
    }

}

for(j=length-1; j > 0; j--)
{
    if(data[right] < pivot)
    {
        j = 0;
    }
    else
    {
        right = j;
    }
}

/* swap elements */
if(left < right)
{
    temp = data[left];
    data[left] = data[right];
    data[right] = temp;
}

int leftCopy = left;
int rightCopy = right;

for(int leftCopy = left; leftCopy<right;leftCopy++)
{
    for(int new_i = left; new_i<length-1; new_i++)
    {
        if(data[left] > pivot)
        {
            new_i = length;
        }
        else
        {
            left = new_i;
        }
    }

    for(int new_j=right; new_j > 0; new_j--)
    {
        if(data[right] < pivot)
        {
            new_j = 0;
        }
        else
        {
            right = new_j;
        }
    }
    leftCopy = left;
    /* swap elements */
    if(left < right)
    {
        temp = data[left];
        data[left] = data[right];
        data[right] = temp;
    }
}

This code works fine and produces correct result, but when I tried to parallelize the parts of above stated code, by changing the first two for loops to the following:

#pragma omp parallel default(none) firstprivate(left) private(i,tid) shared(length, pivot, data)
    {
#pragma omp for
        for(i = 1; i<length-1; i++)
        {
            if(data[left] > pivot)
            {
                i = length;
            }
            else
            {
                left = i;
            }
        }
    }


#pragma omp parallel default(none) firstprivate(right) private(j) shared(length, pivot, data)
    {
#pragma omp for
        for(j=length-1; j > 0; j--)
        {
            if(data[right] < pivot)
            {
                j = 0;
            }
            else
            {
                right = j;
            }
        }
    }

The speed is worse than the non-parallelized code. Please help me identify my problem.

Thanks

1
What is the typical value for length? - damienfrancois
Before you use OpenMP, simply think about how the task can be done in parallel at all. Think of you having several guys which you can give tasks to, your threads. Now, what do you do with a while? What exactly can be done in parallel in a while? With a for, you can easily say "the for runs over an index, for each index, the computation can be done in parallel". Handing each guy an index, or telling them to fish an index out of a bucket, handle it and then get the next one. But in something like a while(true){ if(condition){break;} do_stuff; } ? I don't see a concept in general here. - Aziuth
As for sorting, how about going with merge sort? Take the array, divide it into T intervals for T threads, do them in parallel and then merge them sequentially. Merging takes O(N), Merge sorting the intervals takes the usual O(NlogN) but is independent and therefore can be done in parallel. For a large N, the merging process is dominated by the separated sorting within the intervals. That is, if you really want to do it as an exercise. If you just want something to be sorted in parallel, you get a library that does that. You won't be able to compete with a good library. - Aziuth
(Note that "can be done in parallel" is a theoretical concept - for large arrays, you will encounter the problem that they share the same cache. That's the big reason why you want libraries, the guys who wrote those knew this problem and probably will even state how many threads you should use - most likely not the maximal amount that your computer can create and dependent on your processor and cache types.) - Aziuth

1 Answers

7
votes

First of all, sorting algorithms are very hard to parallelize with OpenMP parallel loops. This is because the loop trip count is not deterministic but depends on the input set values that are read every iteration.

I don't think having loop conditions such as data[left] <= pivot is going to work well, since OpenMP library does not know exactly how to partition the iteration space among the threads.

If you are still interested in parallel sorting algorithms, I suggest you to read the literature first, to see those algorithms that really worth implementing due to their scalability. If you just want to learn OpenMP, I suggest you start with easier algorithms such as bucket-sort, where the number of buckets is well known and does not frequently change.

Regarding the example you try to parallelize, while loops are not directly supported by OpenMP because the number of iterations (loop trip count) is not deterministic (otherwise, it is easy to transform them into for loops). Therefore, it is not possible to distribute the iterations among the threads. In addition, it is common for while loops to check for a condition using last iteration's result. This is called Read-after-Write or true-dependency and cannot be parallelized.

Your slowdown problem might be alleviated if you try to minimize the number of omp parallel clauses. In addition, try to move them out of all your loops. These clauses may create and join the additional threads that are used in the parallel parts of the code, which is expensive.

You can still synchronize threads inside parallel blocks, so that the outcome is similar. In fact, all threads wait for each other at the end of a omp for clause by default, so that this makes things even easier.

#pragma omp parallel default(none) firstprivate(right,left) private(i,j) shared(length, pivot, data)
{
    #pragma omp for
    for(i = 1; i<length-1; i++)
    {
        if(data[left] > pivot)
        {
            i = length;
        }
        else
        {
            left = i;
        }
    }

    #pragma omp for 
    for(j=length-1; j > 0; j--)
    {
        if(data[right] < pivot)
        {
            j = 0;
        }
        else
        {
            right = j;
        }
    }
} // end omp parallel