I have the following situation: I have a big outer for loop that essentially contains a function foo(). Within foo(), there are bar1() and bar2() that can be carried out concurrently, and bar3() that need to be performed after bar1() and bar2() are done. I have parallelized the big outer loop, and section bar1() and bar2(). I assume that each outer loop thread will generate their own section threads, is this correct?
If the assumption above is correct, how do I get bar3() to perform only after threads carrying out bar1() and bar2() finished? If I use critical, it will halt on all threads, including the outer for loop. If I use single, there's no guarantee that bar1() and bar2() will finish.
If the assumption above is not correct, how do I force the outer loop threads to resuse threads for bar1(), bar2() and not generate new threads every time?
Note that temp is a variable whose init and clear are expensive so I pull init and clear outside the for loop. It further complicates matter because both bar1() and bar2() needs some kind of temp variable. Optimally, temp should be init and cleared for each thread that is created, but I'm not sure how to force that for the threads generated for sections. (Without the sections pragma, it works fine in the parallel block).
main(){
#pragma omp parallel private(temp)
init(temp);
#pragma omp for schedule(static)
for (i=0;i<100000;i++) {
foo(temp);
}
clear(temp);
}
foo() {
init(x); init(y);
#pragma omp sections
{
{ bar1(x,temp); }
#pragma omp section
{ bar2(y,temp); }
}
bar3(x,y,temp);
}
bar1
,bar2
, andbar3
run sequentially for each thread. As long asx
,y
andtemp
don't only depend oni
(or are independent of alli
) then it's fine. If this is not the case then you need to be more clear in your question. – Z boson