I am trying to do openACC optimizations for many body simulations. Currently, I am facing a problem which lead to memory problem in below
call to
cuStreamSynchronizereturned error 700: Illegal address during kernel execution
call tocuMemFreeHostreturned error 700: Illegal address during kernel executionsrun: error: jrc0017: task 0: Exited with exit code 1
I am using pgc++ compiler and my compiler flags are -acc -Minfo=accel -ta=tesla -fast -std=c++11 and I don't want to use -ta=tesla:managed because I want to organise memory by myself.
#pragma acc kernels present(sim.part.rx, sim.part.ry, sim.part.rz, sim.part.vx, sim.part.vy, sim.part.vz)
{
for(int idx = 0; idx < sim.num; ++idx) { // Loop over target particle
float
prx = sim.part.rx[idx], // my position
pry = sim.part.ry[idx],
prz = sim.part.rz[idx];
float Fx = 0.f, Fy = 0.f, Fz = 0.f; // Force
#pragma acc loop
for(int jdx = 0; jdx < sim.num; ++jdx) { // Loop over interaction partners
if(idx != jdx) { // No self-force
const float dx = prx - sim.part.rx[jdx]; // Distance to partner
const float dy = pry - sim.part.ry[jdx];
const float dz = prz - sim.part.rz[jdx];
const float h = 1.f/sqrt(dx*dx + dy*dy + dz*dz + eps);
const float h3 = h*h*h;
Fx += dx*h3; // Sum up force
Fy += dy*h3;
Fz += dz*h3;
}
}
sim.part.vx[idx] += sim.mass*dt*Fx; // update velocity
sim.part.vy[idx] += sim.mass*dt*Fy;
sim.part.vz[idx] += sim.mass*dt*Fz;
}
}
If I delete the code in below
sim.part.vx[idx] += sim.mass*dt*Fx; // update velocity
sim.part.vy[idx] += sim.mass*dt*Fy;
sim.part.vz[idx] += sim.mass*dt*Fz;
my code is able to run without problem. But I got memory problem if I un-comment them. It seems that sim.part.vx are try to update the data but compiler don't know which lead to the memory problem.
Does anyone know how to fix this problem?
sim.part.vxwas the problem, but ifsim.massis the problem, try addingfirstprivate(sim.mass). The default behavior for scalars is to make them firstprivate, but the compiler may be getting tripped up by the fact that it's in a struct. - jefflarkinfloat *rx = sim.part.rxand thenpresent(rx), but then you have to update every reference torx. This isn't ideal, which is why the OpenACC technical committee is working on a better solution for structures. - jefflarkin