Short: Does the pragma omp for simd
OpenMP directive generate code that uses SIMD registers?
Longer:
As stated in the OpenMP documentation "The worksharing-loop SIMD construct specifies that the iterations of one or more associated loops will be distributed across threads that already exist [..] using SIMD instructions". From this statement, I would expect the following code (simd.c) to use XMM
, YMM
or ZMM
registers when compiling running gcc simd.c -o simd -fopenmp
but it does not.
#include <stdio.h>
#define N 100
int main() {
int x[N];
int y[N];
int z[N];
int i;
int sum;
for(i=0; i < N; i++) {
x[i] = i;
y[i] = i;
}
#pragma omp parallel
{
#pragma omp for simd
for(i=0; i < N; i++) {
z[i] = x[i] + y[i];
}
#pragma omp for simd reduction(+:sum)
for(i=0; i < N; i++) {
sum += x[i];
}
}
printf("%d %d\n",z[N/2], sum);
return 0;
}
When checking the assembler generated running gcc simd.c -S -fopenmp
no SIMD register is used.
I can use SIMD registers without OpenMP using the option -O3
because according to GCC documentation
it includes the -ftree-vectorize
flag.
XMM
registers:gcc simd.c -o simd -O3
YMM
registers:gcc simd.c -o simd -O3 -march=skylake-avx512
ZMM
registers:gcc simd.c -o simd -O3 -march=skylake-avx512 -mprefer-vector-width=512
However, using the flags -march=skylake-avx512 -mprefer-vector-width=512
combined with -fopenmp
does not generates SIMD instructions.
Therefore, I can easily vectorize my code with -O3
without the pragma omp for simd
but not for the other way around.
At this point, my purpose is not to generate SIMD instructions but to understand how do OpenMP SIMD directives work in GCC and how to generate SIMD instructions only with OpenMP (without -O3
).
movd eax, xmm0
. – Peter Cordes