So there's a number of issues here.
So first, as has been pointed out, i and j (and temp) need to be private; m and a need to be shared. A useful thing to do with openmp is to use default(none), that way you are forced to think through what each variable you use in the parallel section does, and what it needs to be. So this
#pragma omp parallel for private (i,j,temp) shared(a,m) default(none)
is a good start. Making m private in particular is a bit of a disaster, because it means that m is undefined inside the parallel region. The loop, by the way, should start with m = n/2, not m=2.
In addition, you don't need the critical region -- or you shouldn't, for a shell sort. The issue, we'll see in a second, is not so much multiple threads working on the same elements. So if you get rid of those things, you end up with something that almost works, but not always. And that brings us to the more fundamental problem.
The way a shell sort works is, basically, you break the array up into many (here, m
) subarrays, and insertion-sort them (very fast for small arrays), and then reassemble; then continue by breaking them up into fewer and fewer subarrays and insertion sort (very fast, because they're partly sorted). Sorting those many subarrays is somethign that can be done in parallel. (In practice, memory contention will be a problem with this simple approach, but still).
Now, the code you've got does that in serial, but it can't be counted on to work if you just wrap the j loop in an omp parallel for
. The reason is that each iteration through the j loop does one step of one of the insertion sorts. The j+m'th loop iteration does the next step. But there's no guarantee that they're done by the same thread, or in order! If another thread has already done the j+m'th iteration before the first does the j'th, then the insertion sort is messed up and the sort fails.
So the way to make this work is to rewrite the shell sort to make the parallelism more explicit - to not break up the insertion sort into a bunch of serial steps.
#include <stdlib.h>
#include <stdio.h>
#include <sys/time.h>
void insertionsort(int a[], int n, int stride) {
for (int j=stride; j<n; j+=stride) {
int key = a[j];
int i = j - stride;
while (i >= 0 && a[i] > key) {
a[i+stride] = a[i];
i-=stride;
}
a[i+stride] = key;
}
}
void shellsort(int a[], int n)
{
int i, m;
for(m = n/2; m > 0; m /= 2)
{
#pragma omp parallel for shared(a,m,n) private (i) default(none)
for(i = 0; i < m; i++)
insertionsort(&(a[i]), n-i, m);
}
}
void printlist(char *s, int a[], int n) {
printf("%s\n",s);
for (int i=0; i<n; i++) {
printf("%d ", a[i]);
}
printf("\n");
}
int checklist(int a[], int n) {
int result = 0;
for (int i=0; i<n; i++) {
if (a[i] != i) {
result++;
}
}
return result;
}
void seedprng() {
struct timeval t;
gettimeofday(&t, NULL);
srand((unsigned int)(1000000*(t.tv_sec)+t.tv_usec));
}
int main(int argc, char **argv) {
const int n=100;
int *data;
int missorted;
data = (int *)malloc(n*sizeof(int));
for (int i=0; i<n; i++)
data[i] = i;
seedprng();
for (int i=0; i<n; i++) {
int i1 = rand() % n;
int i2 = rand() % n;
int tmp = data[i1];
data[i1] = data[i2];
data[i2] = tmp;
}
printlist("Unsorted List:",data,n);
shellsort2(data,n);
printlist("Sorted List:",data,n);
missorted = checklist(data,n);
if (missorted != 0) printf("%d missorted nubmers\n",missorted);
return 0;
}