OpenMP nested parallelism with sections

Question

I have the following situation: I have a big outer for loop that essentially contains a function foo(). Within foo(), there are bar1() and bar2() that can be carried out concurrently, and bar3() that need to be performed after bar1() and bar2() are done. I have parallelized the big outer loop, and section bar1() and bar2(). I assume that each outer loop thread will generate their own section threads, is this correct?

If the assumption above is correct, how do I get bar3() to perform only after threads carrying out bar1() and bar2() finished? If I use critical, it will halt on all threads, including the outer for loop. If I use single, there's no guarantee that bar1() and bar2() will finish.

If the assumption above is not correct, how do I force the outer loop threads to resuse threads for bar1(), bar2() and not generate new threads every time?

Note that temp is a variable whose init and clear are expensive so I pull init and clear outside the for loop. It further complicates matter because both bar1() and bar2() needs some kind of temp variable. Optimally, temp should be init and cleared for each thread that is created, but I'm not sure how to force that for the threads generated for sections. (Without the sections pragma, it works fine in the parallel block).

main(){
    #pragma omp parallel private(temp)
    init(temp);
    #pragma omp for schedule(static)
    for (i=0;i<100000;i++) {
        foo(temp);
    }
    clear(temp);
}

foo() {
    init(x); init(y);
    #pragma omp sections
    {
        { bar1(x,temp); }
        #pragma omp section
        { bar2(y,temp); }
    }
    bar3(x,y,temp);
}

Why do you wanted to use nested parallelism? You have already parallelized the loop so bar1, bar2, and bar3 run sequentially for each thread. As long as x, y and temp don't only depend on i (or are independent of all i) then it's fine. If this is not the case then you need to be more clear in your question. — Z boson
@Zboson bar1 and bar2 could be run parallely, I want extra threads to run them in parallel but these extra threads should not be generated at each iteration of the loop. Does this make more sense? — nullgraph

warunapww warunapww · Accepted Answer · 2017-10-22T03:26:26

I believe that simply parallelizing the for loop should give you enough parallelism to saturate the resources in CPU. But if you really want to run two functions in parallel, following code should work.

main(){
    #pragma omp parallel private(temp) 
    {
        init(temp);
        #pragma omp for schedule(static)
        for (i=0;i<100000;i++) {
            foo(temp);
        }
        clear(temp);
    }
}

foo() {
    init(x); init(y);

    #pragma omp task
    bar1(x,temp);

    bar2(y,temp);

    #pragma omp taskwait

    bar3(x,y,temp);
}

OpenMP nested parallelism with sections

1 Answers