First of all, the official GCC 4.1.2 does not support OpenMP. Probably you have a RedHat-derived Linux distribution (RHEL, Fedora, CentOS, Scientific Linux, etc.) which has OpenMP support backported into GCC 4.1.2 from some newer version. RH used to maintain that backport for quite some time until they finally switched to a newer GCC version.
Writing to a shared stream results in non-deterministic behaviour in both OpenMP sections and parallel loops. What you observe here is a result of the dynamic scheduling nature of the sections
implementation in GCC. libgomp
(GCC OpenMP runtime) distributes sections on first-come first-served basis among the threads in the team. What probably happens in your case is that sections are so short in size and therefore in execution time, that the first thread to exit the docking barrier at the beginning of the parallel region consumes all work items before the other threads have even caught up, resulting in serial execution of all sections.
As for the parallel for
loop, what you observe is the result of the default loop scheduling in libgomp
being static
, i.e. the 12 iterations are evenly split among the threads in a linear fashion. I would guess that there are 6 threads in your case (based on the text segments from the scrambled output), so thread 0 gets iterations from 0 to 1, thread 1 gets iterations from 2 to 3, and so on. Again, the execution of the iterations in each thread is very well defined, but there is no guarantee in what order would the threads themselves execute.
Note that this behaviour is very GCC-specific. The OpenMP standard says:
The method of scheduling the structured blocks among threads in the team is
implementation defined.
For example, Intel's compiler distributes the sections in a round-robin fashion, i.e. section n
is given to thread n % num_threads
, much like what a parallel for
loop with static scheduling and chunk size of 1 would do.