0
votes

I was looking at question omp parallel for loop (reduction to find max) ran slower than serial codes And I tried to run the hunk of code provided (here is a simplified copy)

#include <stdlib.h>
#include <time.h>
#include <stdio.h>
#include <omp.h>

int main()
{
    double sta, end, elapse_t;
    int bsize = 46000000;
    double *buffer = malloc(bsize * sizeof(double));
    int max_val = 0;

    srand(time(NULL));
    for (int i = 0; i < bsize; i++)
        buffer[i] = rand() % 10000;

    sta = omp_get_wtime();

#pragma omp parallel for reduction(max : max_val)
    for (int i = 0; i < bsize; i++) {
        max_val = max_val > buffer[i] ? max_val : buffer[i];
    }
    end = omp_get_wtime();
    printf("time %f\n", end - sta);

    free(buffer);
    return 0;
}

Compliled with:

gcc test.c -o test -O3 -fopenmp

As the original post was about performances, I noticed that the result was really different from a run to an other, thus I ran it 200 under times and looked at perfs:

$ for i in `seq 200`; do ./test ; done | sort -u

is giving me:

time 0.025260
time 0.025261
time 0.025272
time 0.025319
time 0.025321
...
time 0.036945
time 0.037185
time 0.037988
time 0.039659
time 0.040315
time 0.040645
time 0.041171

So you can see that the slowest one take 50% more time than the fastest which is a huge difference at first sight.

What could explain this?

I tried setting bsize at 450M then performances are quite similar (+- 5%), but I got puzzled: Would it mean that openmp performance is unpredictable unless for very huge stuff?

edit:

My conf is Fedora Linux 26 kernel 4.13.4-200.fc26.x86_64 #1 SMP Thu Sep 28 20:46:39 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

No special environment variable (nothing perticular to openMP)

the CPU is

Intel(R) Core(TM) i3-5010U CPU @ 2.10GHz

a typical unsorted sample is:

time 0.027549
time 0.026125
time 0.027008
time 0.026982
time 0.027888
time 0.025868
time 0.031977
time 0.044448
time 0.027515
time 0.032198
time 0.026162
time 0.025598
time 0.025791

EDIT 2: Thanks to different comments, I eventually realized that this test was not pertinent, especially because timing are really to short compared to the overhead that a task scheduling may introduce. I thus consider this question to be not relevant, thanks for answers, sorry for the waste of time I may have caused.

1
We need more info 1) system specification, particularly CPU. 2) operating system, environment variables 3) unsorted times.Zulan
Being a mobile CPU I assume this is a system with a graphical UI. What effort did you take to avoid impact of applications in the background on the measurement?Zulan
Just of curiosity, why do you use int max_val when buffer is double array.itsnevertoobadtoaskforhelp
One way to be absolutely sure of no variables is not to use a different random array everytime. I also agree with the too low timing comment. Just for an experiment maybe add more work into the loop. Add / multiply some number to the array before comparing?itsnevertoobadtoaskforhelp

1 Answers

0
votes

The timings you post are extremely small so many other factors come into the game, not only openmp. Consider, either using a large dataset or repeat the loop several times (if makes sense)

Also, consider that the overhead of creating the threads might not compensate the extremely simple loop body you have within the openmp pragma (simply searching a maximum value in a vector) and that is run once.