OpenMP task parallelism - performance issue

Question

I have a problem with OpenMP tasks. I am trying to create parallel version of "for" loop using omp tasks. However, the time of execution this version is close to 2 times longer than base version, where I use omp for, and I do not know what is the reason of this. Look at codes bellows:

omp for version:

t.start();
#pragma omp parallel num_threads(threadsNumber)
{
    for(int ts=0; ts<1000; ++ts)
    {
        #pragma omp for
        for(int i=0; i<size; ++i)
        {
            array_31[i] = array_11[i] * array_21[i];
        }
    }
}
t.stop();
cout << "Time of omp for: " << t.time() << endl;

omp task version:

t.start();
#pragma omp parallel num_threads(threadsNumber)
{
    #pragma omp master
    {
        for(int ts=0; ts<1000; ++ts)
        {
            for(int th=0; th<threadsNumber; ++th)
            {
                #pragma omp task
                {
                    for(int i=th*blockSize; i<th*blockSize+blockSize; ++i)
                    {
                        array_32[i] = array_12[i] * array_22[i];
                    }
                }                    
            }

            #pragma omp taskwait
        }
    }
}
t.stop();
cout << "Time of omp task: " << t.time() << endl;

In the tasks version i divide loop in the same way as in omp for. Each of task has to execute the same amount of iterations. Total amount of tasks is equal to total amount of threads.

Performance results:

Time of omp for: 4.54871
Time of omp task: 8.43251

What can be a problem? Is is possible to achive similar performance for both versions? Attached codes are simple, because i wanted to only illustrate my problem, which i try to resolve. I do not expect that both versions give me the same performance, however i would like to reduce the difference.

Thanks for reply. Best regards.

chasep255 chasep255 · Accepted Answer · 2016-08-17T10:02:07

I think the issue here is the overhead. When you declare a loop as parallel it assigns all the threads to execute their part of the for loop all at once. When you start a task it must go though the whole process of setting up every time you launch a task. Why not just do the following.

#pragma omp parallel num_threads(threadsNumber)
{
    #pragma omp master
    {
        for(int ts=0; ts<1000; ++ts)
        {
            #pragma omp for
            for(int th=0; th<threadsNumber; ++th)
            {
                    for(int i=th*blockSize; i<th*blockSize+blockSize; ++i)
                    {
                        array_32[i] = array_12[i] * array_22[i];
                    }                   
            }


        }
    }
}

OpenMP task parallelism - performance issue

2 Answers