Openmp parallel sections vs parallel for

Question

I have these two sections of code

#pragma omp parallel
#pragma omp sections
  {
#pragma omp section
    printf("H");
#pragma omp section
    printf("e");
#pragma omp section
    printf("l");
#pragma omp section
    printf("l");
#pragma omp section
    printf("o");
#pragma omp section
    printf(" ");
#pragma omp section
    printf("W");
#pragma omp section
    printf("o");
#pragma omp section
    printf("r");
#pragma omp section
    printf("l");
#pragma omp section
    printf("d");
#pragma omp section
    printf("!");
  }

and

char word[] = "Hello World!";
int n;

#pragma omp parallel for
  for(n=0; n<12; n++)
  {
  printf("%c", word[n]);
  }

while the first one always prints Hello World! the second one sometimes prints Hello World! and sometimes prints Helld!lo Wor

Why is it that first one seems deterministic and the other does not?

Hristo Iliev Hristo Iliev · Accepted Answer · 2013-03-26T15:43:52

First of all, the official GCC 4.1.2 does not support OpenMP. Probably you have a RedHat-derived Linux distribution (RHEL, Fedora, CentOS, Scientific Linux, etc.) which has OpenMP support backported into GCC 4.1.2 from some newer version. RH used to maintain that backport for quite some time until they finally switched to a newer GCC version.

Writing to a shared stream results in non-deterministic behaviour in both OpenMP sections and parallel loops. What you observe here is a result of the dynamic scheduling nature of the sections implementation in GCC. libgomp (GCC OpenMP runtime) distributes sections on first-come first-served basis among the threads in the team. What probably happens in your case is that sections are so short in size and therefore in execution time, that the first thread to exit the docking barrier at the beginning of the parallel region consumes all work items before the other threads have even caught up, resulting in serial execution of all sections.

As for the parallel for loop, what you observe is the result of the default loop scheduling in libgomp being static, i.e. the 12 iterations are evenly split among the threads in a linear fashion. I would guess that there are 6 threads in your case (based on the text segments from the scrambled output), so thread 0 gets iterations from 0 to 1, thread 1 gets iterations from 2 to 3, and so on. Again, the execution of the iterations in each thread is very well defined, but there is no guarantee in what order would the threads themselves execute.

Note that this behaviour is very GCC-specific. The OpenMP standard says:

The method of scheduling the structured blocks among threads in the team is implementation defined.

For example, Intel's compiler distributes the sections in a round-robin fashion, i.e. section n is given to thread n % num_threads, much like what a parallel for loop with static scheduling and chunk size of 1 would do.

Openmp parallel sections vs parallel for

4 Answers