0
votes

I have an outer for loop that I have parallelized using OpenMP. However within this for loop there are sections of code that can also be executed in parallel.

Can I use OpenMP's sections clause to parallelize this? Is this even possible? Since each iteration of the for loop is run by just one thread, can I (within each iteration), ask for certain sections of code to be run by multiple threads in parallel? Rest of the code should just be run by one thread i.e the thread to which that loop iteration has been assigned.

For ex. I have the following piece of code:

omp_p = omp_get_max_threads();
omp_set_nested(1);
#pragma omp parallel for num_threads(omp_p/2)
for(int p=0;p<omp_p/2;p++){
   size_t a = (p*N)/(omp_p/2);
   size_t b = ((p+1)*N)/(omp_p/2);
   for(int i=a;i<b;i++){
      /*Work on A[a]->A[b]*/
      for(int j=0;j<n;j++){
         for(int k=0;k<N;k++){
           /*Serial code*/
          #pragma omp parallel sections
              {
                 #pragma omp section
                   {

                   }
                 #pragma omp section
                   {

                   }

              }
           /*Serial work*/
           #pragma omp parallel sections
              {
              #pragma omp section
                   {

                   }
                 #pragma omp section
                   {

                   }
              }
           /*Serial code*/
         }
      }
   }
}

This causes the program to go much much slower than if I hadn't used the parallel sections at all..

1
Besides the huge overhead from nested parallelism, your i, j and k loop counters get the default sharing class of shared and should be explicitly declared private.Hristo Iliev
Oh I'm sorry, I forgot to declare them as ints within the for() braces. Corrected this..user1715122
What prevents you from simply decomposing the loop between all threads and execute everything in the inner loops in serial? Is N too low in comparison to the number of threads?Hristo Iliev
That's how I had coded it but realized that apart from a few assignments (which need to be done serially), the innermost 'k' loop primarily has two big sections which can be executed in parallel. It is similar to a matrix multiplication code, with the matrices divided among threads and the chunk of the work happening in the innermost loop. So I was wondering if instaed I should give a bigger chunk of the matrix to each thread and reserve two threads (each) to perform the operations in the innermost loop in parallel. N ~ 5*omp_puser1715122
OpenMP, like most other parallel programming paradigms, introduces synchronisation overhead, and as such it favours computational models where each thread does as much works as possible and synchronises as seldom as possible. You implementation underutilises the threads as there are serial parts outside the nested parallelism where the second thread is doing nothing. Remove the nested parallelism, make the outer loop use all threads and then profile the code for things like false sharing.Hristo Iliev

1 Answers

1
votes

Nested OMP should be possible. But I fear that you might not see any performance gain by doing this due to the following reasons:

  1. Nested OMP might result in generation of more number of threads than the number of CPU cores. This might end up in doing lots of context switching.
  2. Your OMP parallel sections are deep inside 4 nested for loops, so, there might be a possibility of overhead due to creation and destruction of threads.