Is it possible to do a reduction on an array with openmp?

Question

Does OpenMP natively support reduction of a variable that represents an array?

This would work something like the following...

float* a = (float*) calloc(4*sizeof(float));
omp_set_num_threads(13);
#pragma omp parallel reduction(+:a)
for(i=0;i<4;i++){
   a[i] += 1;  // Thread-local copy of a incremented by something interesting
}
// a now contains [13 13 13 13]

Ideally, there would be something similar for an omp parallel for, and if you have a large enough number of threads for it to make sense, the accumulation would happen via binary tree.

May be you could explain a bit more what you want to do exactly. Providing serial code might help. — FFox
Digging around a bit more, it sounds like "only in fortran" is the answer. I ended up just allocating a single large array of local copies outside of the loop, letting the threads accumulate to their own copies within the for loop, then accumulating into a global array after the for loop, still inside the parallel region, inside of a critical section. — Andrew Wagner
Digging even more, here is a research paper on something similar, but it's not in openmp yet. springerlink.com/content/tq76655852630525 — Andrew Wagner
You can probably use atomic rather than critical to guard the individual adds (or even an array of locks) if you want to reduce the overhead; you could even use an array of shared arrays rather than private arrays and try to roll your own binary reduction. But it'll be ugly. — Jonathan Dursi
I ended up manually allocating space for thread-local copies of the arrays. Each thread does 1/8 of the accumulation into its local copy, and then the threads accumulate their local copy into a global copy inside of a #pragma omp critical block. Since the number of cores (8) is much smaller than n, the synchronization overhead is negligible. It ain't pretty, but it works. — Andrew Wagner

decvalts decvalts · Accepted Answer · 2016-11-07T17:14:25

Array reduction is now possible with OpenMP 4.5 for C and C++. Here's an example:

#include <iostream>

int main()
{

  int myArray[6] = {};

  #pragma omp parallel for reduction(+:myArray[:6])
  for (int i=0; i<50; ++i)
  {
    double a = 2.0; // Or something non-trivial justifying the parallelism...
    for (int n = 0; n<6; ++n)
    {
      myArray[n] += a;
    }
  }
  // Print the array elements to see them summed   
  for (int n = 0; n<6; ++n)
  {
    std::cout << myArray[n] << " " << std::endl;
  } 
}

Outputs:

I compiled this with GCC 6.2. You can see which common compiler versions support the OpenMP 4.5 features here: https://www.openmp.org/resources/openmp-compilers-tools/

Note from the comments above that while this is convenient syntax, it may invoke a lot of overheads from creating copies of each array section for each thread.

Is it possible to do a reduction on an array with openmp?

5 Answers