I need to run a short outer loop and a long inner loop. I would like to parallelize the latter and not the former. The reason is that there is an array that is updated after the inner loop has run. The code I am using is the following
#pragma omp parallel{
for(j=0;j<3;j++){
s=0;
#pragma omp for reduction(+:s)
for(i=0;i<10000;i++)
s+=1;
A[j]=s;
}
}
This actually hangs. The following works just fine, but I'd rather avoid the overhead of starting a new parallel region since this was preceded by another.
for(j=0;j<3;j++){
s=0;
#pragma omp parallel for reduction(+:s)
for(i=0;i<10000;i++)
s+=1;
A[j]=s;
}
What is the correct (and fastest) way of doing this?