1
votes

I am trying to do openACC optimizations for many body simulations. Currently, I am facing a problem which lead to memory problem in below

call to cuStreamSynchronize returned error 700: Illegal address during kernel execution
call to cuMemFreeHost returned error 700: Illegal address during kernel execution
srun: error: jrc0017: task 0: Exited with exit code 1

I am using pgc++ compiler and my compiler flags are -acc -Minfo=accel -ta=tesla -fast -std=c++11 and I don't want to use -ta=tesla:managed because I want to organise memory by myself.

#pragma acc kernels present(sim.part.rx, sim.part.ry, sim.part.rz, sim.part.vx, sim.part.vy, sim.part.vz) 
{
  for(int idx = 0; idx < sim.num; ++idx) {     // Loop over target particle
    float
      prx = sim.part.rx[idx],                   // my position
      pry = sim.part.ry[idx],
      prz = sim.part.rz[idx];
    float Fx = 0.f, Fy = 0.f, Fz = 0.f;          // Force
    #pragma acc loop 
    for(int jdx = 0; jdx < sim.num; ++jdx) {   // Loop over interaction partners
      if(idx != jdx) {                          // No self-force
        const float dx = prx - sim.part.rx[jdx]; // Distance to partner
        const float dy = pry - sim.part.ry[jdx];
        const float dz = prz - sim.part.rz[jdx];
        const float h  = 1.f/sqrt(dx*dx + dy*dy + dz*dz + eps);
        const float h3 = h*h*h;
        Fx += dx*h3;                            // Sum up force
        Fy += dy*h3;
        Fz += dz*h3;
      }
    }
    sim.part.vx[idx] += sim.mass*dt*Fx;         // update velocity
    sim.part.vy[idx] += sim.mass*dt*Fy;
    sim.part.vz[idx] += sim.mass*dt*Fz;
  }
}

If I delete the code in below

sim.part.vx[idx] += sim.mass*dt*Fx;         // update velocity
sim.part.vy[idx] += sim.mass*dt*Fy;
sim.part.vz[idx] += sim.mass*dt*Fz;

my code is able to run without problem. But I got memory problem if I un-comment them. It seems that sim.part.vx are try to update the data but compiler don't know which lead to the memory problem.

Does anyone know how to fix this problem?

1
Which compiler are you using for this? - talonmies
Compiler bug? Have you asked the site staff who operate this Cray for support? - Jeff Hammond
Have you tried taking sim.mass out from the struct to a separate scalar variable? - Ruyk
@Lbj_x @Ruyk I'd assumed that sim.part.vx was the problem, but if sim.mass is the problem, try adding firstprivate(sim.mass). The default behavior for scalars is to make them firstprivate, but the compiler may be getting tripped up by the fact that it's in a struct. - jefflarkin
@Lbj_x It may be that the struct itself isn't on the device. A common workaround is to create pointers to the structure members and work with them instead. For instance float *rx = sim.part.rx and then present(rx), but then you have to update every reference to rx. This isn't ideal, which is why the OpenACC technical committee is working on a better solution for structures. - jefflarkin

1 Answers

2
votes

I suspect the problem is that sim and sim.part are not on the device (or the compiler doesn't realize that they're on the device. As a workaround, can you try introducing pointers to those arrays directly?

float *rx = sim.part.rx, *ry = sim.part.ry, *rz = sim.part.rz, 
      *vx = sim.part.vx, *vy = sim.part.vy, *vz = sim.part.vz;
#pragma acc kernels present(rx, ry, rz, vx, vy, vz) 
{
  for(int idx = 0; idx < sim.num; ++idx) {     // Loop over target particle
    float
      prx = rx[idx],                   // my position
      pry = ry[idx],
      prz = rz[idx];
    float Fx = 0.f, Fy = 0.f, Fz = 0.f;          // Force
    #pragma acc loop 
    for(int jdx = 0; jdx < sim.num; ++jdx) {   // Loop over interaction partners
      if(idx != jdx) {                          // No self-force
        const float dx = prx - rx[jdx]; // Distance to partner
        const float dy = pry - ry[jdx];
        const float dz = prz - rz[jdx];
        const float h  = 1.f/sqrt(dx*dx + dy*dy + dz*dz + eps);
        const float h3 = h*h*h;
        Fx += dx*h3;                            // Sum up force
        Fy += dy*h3;
        Fz += dz*h3;
      }
    }
    vx[idx] += sim.mass*dt*Fx;         // update velocity
    vy[idx] += sim.mass*dt*Fy;
    vz[idx] += sim.mass*dt*Fz;
  }
}

How are sim and sim.part allocated? It's possible to use unstructured data directives in the constructor and destructor to make sure that sim and sim.part are on the device too. If you've already done this, then another possible solution is to add present(sim, sim.part) to your existing present clause so the compiler knows that you've already taken care of those data structures too.