20
votes

Does OpenMP natively support reduction of a variable that represents an array?

This would work something like the following...

float* a = (float*) calloc(4*sizeof(float));
omp_set_num_threads(13);
#pragma omp parallel reduction(+:a)
for(i=0;i<4;i++){
   a[i] += 1;  // Thread-local copy of a incremented by something interesting
}
// a now contains [13 13 13 13]

Ideally, there would be something similar for an omp parallel for, and if you have a large enough number of threads for it to make sense, the accumulation would happen via binary tree.

5
May be you could explain a bit more what you want to do exactly. Providing serial code might help. - FFox
Digging around a bit more, it sounds like "only in fortran" is the answer. I ended up just allocating a single large array of local copies outside of the loop, letting the threads accumulate to their own copies within the for loop, then accumulating into a global array after the for loop, still inside the parallel region, inside of a critical section. - Andrew Wagner
Digging even more, here is a research paper on something similar, but it's not in openmp yet. springerlink.com/content/tq76655852630525 - Andrew Wagner
You can probably use atomic rather than critical to guard the individual adds (or even an array of locks) if you want to reduce the overhead; you could even use an array of shared arrays rather than private arrays and try to roll your own binary reduction. But it'll be ugly. - Jonathan Dursi
I ended up manually allocating space for thread-local copies of the arrays. Each thread does 1/8 of the accumulation into its local copy, and then the threads accumulate their local copy into a global copy inside of a #pragma omp critical block. Since the number of cores (8) is much smaller than n, the synchronization overhead is negligible. It ain't pretty, but it works. - Andrew Wagner

5 Answers

8
votes

Array reduction is now possible with OpenMP 4.5 for C and C++. Here's an example:

#include <iostream>

int main()
{

  int myArray[6] = {};

  #pragma omp parallel for reduction(+:myArray[:6])
  for (int i=0; i<50; ++i)
  {
    double a = 2.0; // Or something non-trivial justifying the parallelism...
    for (int n = 0; n<6; ++n)
    {
      myArray[n] += a;
    }
  }
  // Print the array elements to see them summed   
  for (int n = 0; n<6; ++n)
  {
    std::cout << myArray[n] << " " << std::endl;
  } 
}

Outputs:

100
100
100
100
100
100

I compiled this with GCC 6.2. You can see which common compiler versions support the OpenMP 4.5 features here: https://www.openmp.org/resources/openmp-compilers-tools/

Note from the comments above that while this is convenient syntax, it may invoke a lot of overheads from creating copies of each array section for each thread.

3
votes

Only in Fortran in OpenMP 3.0, and probably only with certain compilers.

See the last example (Example 3) on:

http://wikis.sun.com/display/openmp/Fortran+Allocatable+Arrays

2
votes

Now the latest openMP 4.5 spec has supports of reduction of C/C++ arrays. http://openmp.org/wp/2015/11/openmp-45-specs-released/

And latest GCC 6.1 also has supported this feature. http://openmp.org/wp/2016/05/gcc-61-released-supports-openmp-45/

But I didn't give it a try yet. Wish others can test this feature.

1
votes

OpenMP cannot perform reductions on array or structure type variables (see restrictions).

You also might want to read up on private and shared clauses. private declares a variable to be private to each thread, where as shared declares a variable to be shared among all threads. I also found the answer to this question very useful with regards to OpenMP and arrays.

0
votes

OpenMP can perform this operation as of OpenMP 4.5 and GCC 6.3 (and possibly lower) supports it. An example program looks as follows:

#include <vector>
#include <iostream>

int main(){
  std::vector<int> vec;

  #pragma omp declare reduction (merge : std::vector<int> : omp_out.insert(omp_out.end(), omp_in.begin(), omp_in.end()))

  #pragma omp parallel for default(none) schedule(static) reduction(merge: vec)
  for(int i=0;i<100;i++)
    vec.push_back(i);

  for(const auto x: vec)
    std::cout<<x<<"\n";

  return 0;
}

Note that omp_out and omp_in are special variables and that the type of the declare reduction must match the vector you are planning to reduce on.