4
votes

Why is the parallel application taking more time to execute than the one with the single thread? I am using an 8 CPU computer with Ubuntu 14.04. The code is just my simple way to test omp parallel sections, the aim later is to run two different functions in two different threads at the same time, so I do not want to use #pragma omp parallel for.

parallel:

int main()
{
int k = 0;
int m = 0;

omp_set_num_threads(2);

#pragma omp parallel
{
    #pragma omp sections
    {
        #pragma omp section
        {
            for( k = 0; k < 1e9; k++ ){};
        }

        #pragma omp section
        {
            for( m = 0; m < 1e9; m++ ){};
        }
    }
}

return 0;

}

and the single thread:

int main()
{
int m = 0;
int k = 0;

for( k = 0; k < 1e9; k++ ){};


for( m = 0; m < 1e9; m++ ){};

return 0;

}
2
I'm not sure of the answer, but it might help to post a benchmark below your code demonstrating the performance difference.Will
If your loop is really empty, it will just get optimized away. The OpenMP code maybe not. In other words, your example is useless.dgrat

2 Answers

3
votes

If the compiler would not optimise the loops, then the parallel code would suffer from false sharing because m and k are very likely to end up in the same cache line. Make the variables private:

#pragma omp parallel private(k,m)
{
    #pragma omp sections
    {
        #pragma omp section
        {
            for( k = 0; k < 1e9; k++ ){};
        }

        #pragma omp section
        {
            for( m = 0; m < 1e9; m++ ){};
        }
    }
}

At high optimisation levels, the compiler could drop the loops altogether. But then the parallel version will still have the added overhead of spawning the OpenMP worker threads and joining them afterwards, which will make it slower than the sequential version.

1
votes

In above test code compiler itself optimizing the code. You need to change your test code. Depending on number of thread you are creating also add an overhead. Also refer, Amdahl’s Law.