I have a situation that I need to repeat a specific iteration of the loop multiple times. So, in that specific iteration, I am reducing the index one step so that next increment of the loop index makes no difference.
This approach, which is the approach I have to implement, works for multi-threaded OpenMP codes. However, it does not work for OpenACC (for both multicore and tesla targets). I get the following error:
Floating point exception (core dumped)
Here is the code for both of cases:
#include <stdio.h>
#include <omp.h>
#include <unistd.h>
int main() {
int x = 52;
int count = 5;
int i;
omp_set_num_threads(6);
#pragma omp parallel for
for(i=0;i<100;i++) {
if(i == x) {
printf("%d\n", i);
i--;
count--;
if(count == 0)
x = 10000;
}
}
int gpu_count = 0;
count = 5;
x = 52;
#pragma acc parallel loop independent
for(i=0;i<1000000;i++) {
if(i == x) {
#pragma acc atomic
gpu_count++;
i--;
count--;
if(count == 0)
x = 2000000;
}
}
printf("gpu_count: %d\n", gpu_count);
return 0;
}
For OpenMP, I get the correct output:
52
52
52
52
52
But, for the OpenACC, I get the abovementioned error.
If I comment line 35 (i--;), the code will be executed correctly and it will output number of repeated iterations (which is 1).
Note: I am using PGI 16.5 with Geforce GTX 970 and CUDA 7.5.
I compile with PGI compiler like following:
pgcc -mp -acc -ta=multicore -g f1.c
So, my question is: why I see such a behavior? Can't I change the loop index variable in OpenACC?