omp parallel for loop (reduction to find max) ran slower than serial codes

Question

I am new in using OpenMP. I think that use max reduction clause to find the max element of an array is not such a bad idea, but in fact the parallel for loop ran much slower than serial one.

int main() {
double sta, end, elapse_t;
int bsize = 46000;
int q = bsize;
int max_val = 0;
double *buffer;
buffer = (double*)malloc(bsize*sizeof(double));
srand(time(NULL));
for(int i=0;i<q;i++)
    buffer[i] = rand()%10000;

sta = omp_get_wtime();
//int i;
#pragma omp parallel for reduction(max : max_val)
for(int i=0;i<q; i++)
{
    max_val = max_val > buffer[i] ? max_val : buffer[i];
}
end = omp_get_wtime();
printf("parallel maximum time %f\n", end-sta);

sta = omp_get_wtime();
for(int i=0;i<q; i++)
{
    max_val = max_val > buffer[i] ? max_val : buffer[i];
}
end = omp_get_wtime();
printf("serial maximum time   %f\n", end-sta);

free(buffer); 
return 0;}

Compile command

gcc-7 kp_omp.cpp -o kp_omp -fopenmp

Execution results

./kp_omp 
parallel maximum time 0.000505
serial maximum time   0.000266

As for the CPU, it is an Intel Core i7-6700 with 8 cores.

Stram Stram · Accepted Answer · 2017-10-02T14:47:40

Whenever you parallelise a loop openMP needs to perform some operations, for example creating the threads. Those operations result in some overhead and this in turns implies that, for each loop, there is a minimum number of iterations under which it is not convenient to parallelise.

If I execute your code I obtain the same results you have:

./kp_omp
parallel maximum time 0.000570
serial maximum time   0.000253

However if I modify bsize in line 8 such that

int bsize = 100000;

I obtain

./kp_omp
parallel maximum time 0.000323
serial maximum time   0.000552

So the parallelised version got faster than the sequential. Part of the challenges you encounter when you try to speedup the execution of a code is to understand when it is convenient to parallelise and when the overhead of the parallelisation would kill your expected gain in performance.

omp parallel for loop (reduction to find max) ran slower than serial codes

1 Answers