I have a problem with OpenMP tasks. I am trying to create parallel version of "for" loop using omp tasks. However, the time of execution this version is close to 2 times longer than base version, where I use omp for, and I do not know what is the reason of this. Look at codes bellows:
omp for version:
t.start();
#pragma omp parallel num_threads(threadsNumber)
{
for(int ts=0; ts<1000; ++ts)
{
#pragma omp for
for(int i=0; i<size; ++i)
{
array_31[i] = array_11[i] * array_21[i];
}
}
}
t.stop();
cout << "Time of omp for: " << t.time() << endl;
omp task version:
t.start();
#pragma omp parallel num_threads(threadsNumber)
{
#pragma omp master
{
for(int ts=0; ts<1000; ++ts)
{
for(int th=0; th<threadsNumber; ++th)
{
#pragma omp task
{
for(int i=th*blockSize; i<th*blockSize+blockSize; ++i)
{
array_32[i] = array_12[i] * array_22[i];
}
}
}
#pragma omp taskwait
}
}
}
t.stop();
cout << "Time of omp task: " << t.time() << endl;
In the tasks version i divide loop in the same way as in omp for. Each of task has to execute the same amount of iterations. Total amount of tasks is equal to total amount of threads.
Performance results:
Time of omp for: 4.54871
Time of omp task: 8.43251
What can be a problem? Is is possible to achive similar performance for both versions? Attached codes are simple, because i wanted to only illustrate my problem, which i try to resolve. I do not expect that both versions give me the same performance, however i would like to reduce the difference.
Thanks for reply. Best regards.