OpenACC Loop not vectorized/parallelized: contains call

Question

I am trying to parallelize a program that builds Voronoi-diagrams with OpenACC. I am currently struggling with parallizing a nested for-loop that calls a function that is defined in a different file. I know that you are supposed to use the #pragma acc routine decorator on the function but I cant make it work in my program.

I am using the PGI compiler and I am getting the following output

69, Loop is parallelizable
         Generating Multicore code
         69, #pragma acc loop gang
     72, Loop is parallelizable
         Loop not vectorized/parallelized: contains call

the relevant code from the main file that produces this output is the following

main.c

#pragma acc routine seq
extern int determineColor(int, int, int*, int);


int main(int argc, char** argv){

        ....

        unsigned int(*imageBuffer)[yDim] = malloc(sizeof(int[xDim][yDim]));

        #pragma acc kernels loop
        for (int y = 0; y < yDim; y++)
        {
            for (int x = 0; x < xDim; x++)
            {
                imageBuffer[x][y] = determineColor(x, y, points, numPoints);
            }
        }
}

Voronoi.c

#pragma acc routine seq
int determineColor(int x, int y, int* points, int numPoints)
{
   ...
}

Mat Colgrove Mat Colgrove · Accepted Answer · 2020-06-12T15:31:28

I'm assuming your using the flag "-Minfo" and hence getting all compiler feedback, including host code generation, not just feedback for OpenACC.

69, Loop is parallelizable
         Generating Multicore code
         69, #pragma acc loop gang

This OpenACC output looks correct in that the compiler is successfully parallelizing the outer "y" loop and applying a gang schedule.

 72, Loop is parallelizable

This OpenACC message is indicating that the loop could be parallelized. However when targeting multicore CPU, only gang level parallelism is applied. If targeting a Tesla device, the compiler would most likely schedule the inner loop using a vector schedule.

The confusion comes here:

     Loop not vectorized/parallelized: contains call

This is a host code generation feedback message indicating that the inner loop is not being vecotrized due to the call. The "parallelized" is for host-side auto-parallelism (i.e. the -Mconcur flag) which isn't applicable in this case.

Note, you can limit the feedback messages for only the OpenACC by adding the sub-option "-Minfo=accel".

Hope this helps!

OpenACC Loop not vectorized/parallelized: contains call

1 Answers