5
votes

I am using OpenMP and I want to spawn threads such that one thread executes one piece of code and finishes, in parallel with N threads running the iterations of a parallel-for loop.

Execution should be like this:

Section A (one thread)       ||      Section B (parallel-for, multiple threads)
         |                   ||        | | | | | | | | | |
         |                   ||        | | | | | | | | | |
         |                   ||        | | | | | | | | | |
         |                   ||        | | | | | | | | | |
         |                   ||        | | | | | | | | | |
         V                   ||        V V V V V V V V V V

I cannot just write a parallel-for with a #pragma omp once because I do not want the thread that executes section A to execute the for-loop.

I have tried this:

#pragma omp parallel sections {
    #pragma omp section
    {
         // Section A
    }

    #pragma omp section
    {
         // Section B;
         #pragma omp parallel for
         for (int i = 0; i < x; ++i)
             something();
    }
}

However, the parallel-for always executes with only one thread (I know because I made the body of the loop print omp_get_thread_num() and they are all the same number, either 1 or 0 depending on which thread of the two executed the second parallel section).

I have also tried

#pragma omp sections {
    #pragma omp section
    {
         // Section A
    }

    #pragma omp section
    {
         // Section B;
         #pragma omp parallel for
         for (int i = 0; i < x; ++i)
             something();
    }
}

Which allows the for-loop to execute with multiple threads, but it makes the sections non-parallel, and the first section is executed sequentially before the second section.

What I need is a combination of the two approaches, where each iteration of the for-loop and the first section are all run in parallel.

3

3 Answers

2
votes

Nested parallelism must be explicitly set, as it is disabled by default in most implementations. Standing to the OpenMP 4.0 standard, you must set the OMP_NESTED environment variable:

The OMP_NESTED environment variable controls nested parallelism by setting the initial value of the nest-var ICV. The value of this environment variable must be true or false. If the environment variable is set to true, nested parallelism is enabled; if set to false, nested parallelism is disabled. The behavior of the program is implementation defined if the value of OMP_NESTED is neither true nor false.

The following line should work for bash:

 export OMP_NESTED=true

Futhermore, as noted by @HristoIliev in the comment below, it's very likely that you want to set the OMP_NUM_THREADS environment variable to tune performance. Quoting the standard:

The value of this environment variable must be a list of positive integer values. The values of the list set the number of threads to use for parallel regions at the corresponding nested levels.

This means that one should set the value of OMP_NUM_THREADS similar to n,n-1 where n is the number of CPU cores. For instance:

export OMP_NUM_THREADS=8,7

for an 8-core system (example copied from the comment below).

0
votes

Maybe the following code could do the trick:

#pragma omp parallel for
for (int i = -1; i < x; ++i) {
    if (i==-1) {
        // Section A
    }
    else {
        // Section B
        something();
    }
}

But, you need to make sure that x >= 0.

0
votes

Have you tried a single parallel region with differentiated activity. e.g. Nested parallelism should not be needed.

#pragma omp_parallel 
{

#pragma omp task  
  once();

#pragma omp for 
  for(int i=0;i<N;i++){
    many(i);
  }

}

You could also use explicit if / else based on threadnum in a parallel region and do the work sharing calculation yourself for the second loop.