I was looking at question omp parallel for loop (reduction to find max) ran slower than serial codes And I tried to run the hunk of code provided (here is a simplified copy)
#include <stdlib.h>
#include <time.h>
#include <stdio.h>
#include <omp.h>
int main()
{
double sta, end, elapse_t;
int bsize = 46000000;
double *buffer = malloc(bsize * sizeof(double));
int max_val = 0;
srand(time(NULL));
for (int i = 0; i < bsize; i++)
buffer[i] = rand() % 10000;
sta = omp_get_wtime();
#pragma omp parallel for reduction(max : max_val)
for (int i = 0; i < bsize; i++) {
max_val = max_val > buffer[i] ? max_val : buffer[i];
}
end = omp_get_wtime();
printf("time %f\n", end - sta);
free(buffer);
return 0;
}
Compliled with:
gcc test.c -o test -O3 -fopenmp
As the original post was about performances, I noticed that the result was really different from a run to an other, thus I ran it 200 under times and looked at perfs:
$ for i in `seq 200`; do ./test ; done | sort -u
is giving me:
time 0.025260
time 0.025261
time 0.025272
time 0.025319
time 0.025321
...
time 0.036945
time 0.037185
time 0.037988
time 0.039659
time 0.040315
time 0.040645
time 0.041171
So you can see that the slowest one take 50% more time than the fastest which is a huge difference at first sight.
What could explain this?
I tried setting bsize at 450M then performances are quite similar (+- 5%), but I got puzzled: Would it mean that openmp performance is unpredictable unless for very huge stuff?
edit:
My conf is Fedora Linux 26 kernel 4.13.4-200.fc26.x86_64 #1 SMP Thu Sep 28 20:46:39 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
No special environment variable (nothing perticular to openMP)
the CPU is
Intel(R) Core(TM) i3-5010U CPU @ 2.10GHz
a typical unsorted sample is:
time 0.027549
time 0.026125
time 0.027008
time 0.026982
time 0.027888
time 0.025868
time 0.031977
time 0.044448
time 0.027515
time 0.032198
time 0.026162
time 0.025598
time 0.025791
EDIT 2: Thanks to different comments, I eventually realized that this test was not pertinent, especially because timing are really to short compared to the overhead that a task scheduling may introduce. I thus consider this question to be not relevant, thanks for answers, sorry for the waste of time I may have caused.