The
#pragma omp parallel
:
will create a parallel region
with a team of threads
, where each thread will execute the entire block of code that the parallel region
encloses.
From the OpenMP 5.1 one can read a more formal description :
When a thread encounters a parallel construct, a team of threads is
created to execute the parallel region (..). The
thread that encountered the parallel construct becomes the primary
thread of the new team, with a thread number of zero for the duration
of the new parallel region. All threads in the new team, including the
primary thread, execute the region. Once the team is created, the
number of threads in the team remains constant for the duration of
that parallel region.
The:
#pragma omp parallel for
will create a parallel region
(as described before), and to the threads
of that region the iterations of the loop that it encloses will be assigned, using the default chunk size
, and the default schedule
which is typically static
. Bear in mind, however, that the default schedule
might differ among different concrete implementation of the OpenMP
standard.
From the OpenMP 5.1 you can read a more formal description :
The worksharing-loop construct specifies that the iterations of one or
more associated loops will be executed in parallel by threads in the
team in the context of their implicit tasks. The iterations are
distributed across threads that already exist in the team that is
executing the parallel region to which the worksharing-loop region
binds.
Moreover,
The parallel loop construct is a shortcut for specifying a parallel
construct containing a loop construct with one or more associated
loops and no other statements.
Or informally, #pragma omp parallel for
is a combination of the constructor #pragma omp parallel
with #pragma omp for
. In your case, this would mean that:
#pragma omp parallel for
{
for(int i=0;i<n;i++)
{
c[i]=a[i]+b[i];
}
}
is semantically, and logically, the same as:
#pragma omp parallel
{
#pragma omp for
for(int i=0;i<n;i++)
{
c[i]=a[i]+b[i];
}
}
TL;DR: In your example, with #pragma omp parallel for
the loop will be parallelized among threads (i.e., the loop iterations will be divided among threads), whereas with #pragma omp parallel
all threads will execute (in parallel) all the loop iterations.
To make it more illustrative, with 4
threads the #pragma omp parallel
, would result in something like:
whereas #pragma omp parallel for
with a chunk_size=1
and a static schedule
would result in something like:
Code-wise the loop would be transformed to something logically similar to:
for(int i=omp_get_thread_num(); i < n; i+=omp_get_num_threads())
{
c[i]=a[i]+b[i];
}
where omp_get_thread_num()
The omp_get_thread_num routine returns the thread number, within the
current team, of the calling thread.
and omp_get_num_threads()
Returns the number of threads in the current team. In a sequential
section of the program omp_get_num_threads returns 1.
or in other words, for(int i = THREAD_ID; i < n; i += TOTAL_THREADS)
. With THREAD_ID
ranging from 0
to TOTAL_THREADS - 1
, and TOTAL_THREADS
representing the total number of threads of the team created on the parallel region.
I have learned that we need to use #pragma omp parallel for while
using OpenMP on the for loop. But I have also tried the same thing
with #pragma omp parallel and it is also giving me the correct output.
It gives you the same output, because in your code:
c[i]=a[i]+b[i];
array a
and array b
are only read, and array c[i]
is the only one being updated, and its value does not depend on how many times the iteration i
will be executed. Nevertheless, with #pragma omp parallel for
each thread will update its own i
, whereas with #pragma omp parallel
threads will be updating the same i
s, hence overriding each others values.
Now try to do the same with the following code:
#pragma omp parallel for
{
for(int i=0;i<n;i++)
{
c[i]= c[i] + a[i] + b[i];
}
}
and
#pragma omp for
{
for(int i=0;i<n;i++)
{
c[i] = c[i] + a[i] + b[i];
}
}
you will immediately notice the difference.