How to distribute teams on GPU using OpenMP?

Question

i'm trying to utilize my Nvidia Geforce GT 740M for parallel-programming using OpenMP and the clang-3.8 compiler.

When processed in parallel on the CPU, I manage to get the desired result. However, when processed on the GPU, my results are some almost random numbers.

Therefore, I figured that I'm not correctly distributing my thread teams and that there might be some data races. I guess I have to do my for-loops differently but I have no idea where the mistake could be.

#include <stdio.h>
#include <stdlib.h>
#include <omp.h>


int main(int argc, char* argv[])
    {
    const int n =100; float a = 3.0f; float b = 2.0f;
    float *x = (float *) malloc(n * sizeof(float));
    float *y = (float *) malloc(n * sizeof(float));

    int i;
    int j;
    int k;

    double start;
    double end;
    start = omp_get_wtime();


    for (k=0; k<n; k++){
        x[k] = 2.0f;
        y[k] = 3.0f;
    }


    #pragma omp target data map(to:x[0:n]) map(tofrom:y[0:n]) map(to:i) map(to:j)
    {

        #pragma omp target teams 
        #pragma omp distribute
        for(i = 0; i < n; i++) {

            #pragma omp parallel for
            for (j = 0; j < n; j++){
                y[j] = a*x[j] + y[j];
        }
    }


}


end = omp_get_wtime();

printf("Work took %f seconds.\n", end - start);

free(x); free(y);   

return 0;
}

I guess that it might have something to to with the Architecture of my GPU. So therefore I'm adding this:

GPU Information

Im fairly new to the topic, so thanks for your help :)

Arpith Jacob Arpith Jacob · Accepted Answer · 2017-05-19T14:55:16

Yes, there is a race here. Different teams are reading and writing to the same element of the array 'y'. Perhaps you want something like this?

for(i = 0; i < n; i++) {
  #pragma omp target teams distribute parallel for
  for (j = 0; j < n; j++){
    y[j] = a*x[j] + y[j];
  }
}

How to distribute teams on GPU using OpenMP?

1 Answers