0
votes

I am trying to copy over an array of C++ objects from the host to the GPU device using OpenACC. I've tried looking online but have only found very vague documentation of how to accomplish this so wasn't sure if anyone knew the proper way to do it? I am also using the nvhpc pgi compiler with an NVIDIA gpu so I have access to managed memory, would like an example using manual transfer and/or managed.

Let's say I have simple Assignment class (in regards to a school assignment):

class Assignment{
Public:
   string name; //name of assignment
   double *grades; // list of grades where n is the number of students

   Assignment(string name_, double *grades_) {
      name = name_;
      grades = grades_;
   }
}


int main(){
   Assignment assignments[2]; // array of 2 assignments

   double grades1[3] = {90.0, 95.0, 75.0};
   assignments[0] = Assignment("Assignment1", grades1);

   double grades2[3] = {50.0, 65.0, 55.0};
   assignments[1] = Assignment("Assignment2", grades2);

   //Transfer to GPU below...
   #pragma ...
}

How would I transfer this array of Assignment C++ objects over to the GPU using OpenACC manual memory management? And/or also using managed memory?

1

1 Answers

1
votes

Unified memory (i.e. managed) is available for allocated memory. Hence if you change the code to dynamically allocate the arrays and compile using the flag "-gpu=managed", then the data movement of the arrays will be handled by the driver.

OpenACC data regions will perform a shallow copy. Hence for aggregate data type with dynamic data members, you need to do a manual deep copy as shown below:

% cat grades.cpp
#include <iostream>
#include <string>

using namespace std;

class Assignment{
public:
   std::string name; //name of assignment
   double *grades; // list of grades where n is the number of students

   Assignment() {};
   Assignment(string name_, double *grades_) {
      name = name_;
      grades = grades_;
   }
};


int main(){
   Assignment assignments[2]; // array of 2 assignments

   double grades1[3] = {90.0, 95.0, 75.0};
   assignments[0] = Assignment("Assignment1", grades1);

   double grades2[3] = {50.0, 65.0, 55.0};
   assignments[1] = Assignment("Assignment2", grades2);

   //Transfer to GPU below...
   #pragma acc enter data copyin(assignments[:2])
   for (int i=0; i < 2; ++i) {
        #pragma acc enter data copyin(assignments[i].grades[:3])
   }
   // changes the grades on the device
   #pragma acc parallel loop present(assignments)
   for (int i=0; i < 2; ++i) {
      for (int j=0; j < 3; ++j) {
         assignments[i].grades[j] += 5.0;
      }
   }

   for (int i=0; i < 2; ++i) {
        #pragma acc update self(assignments[i].grades[:3])
        std::cout << i << " : " << assignments[i].grades[0]
                  << ", " <<  assignments[i].grades[1]
                  << ", " <<  assignments[i].grades[2]
                  << std::endl;
   }


   for (int i=0; i < 2; ++i) {
        #pragma acc exit data delete(assignments[i].grades)
   }
   #pragma acc exit data delete(assignments)

}
% nvc++ grades.cpp -w -Minfo=accel -acc; a.out
main:
     26, Generating enter data copyin(assignments[:])
     32, Generating enter data copyin(assignments.grades[:3])
         Generating present(assignments[:])
         Generating Tesla code
         35, #pragma acc loop gang /* blockIdx.x */
         36, #pragma acc loop seq
     36, Loop is parallelizable
     43, Generating update self(assignments.grades[:3])
     52, Generating exit data delete(assignments.grades)
     55, Generating exit data delete(assignments[:])
0 : 95, 100, 80
1 : 55, 70, 60

I personally prefer to encapsulate the data management into the class itself. It's a bit more tricky here since you have an array of classes where each element is a copy of an anonymous object. While it may or may not be useful, you can take a look at a simple vector like class, "accList", that I made as an example for my chapter on data management for the book Parallel Programing with OpenACC. See: https://github.com/rmfarber/ParallelProgrammingWithOpenACC/tree/master/Chapter05