0
votes

I have a CPU code:

 `if(number_of_pushed_particles<N&&number_of_alive_particles<K)
 {
   push_particle();
   number_of_pushed_particles++;
 }`

Here number_of_pushed_particles, number_of_alive_particles, K and N are int, K and N are const. The function push_particle() is:

 `push_particle()
 {
   particles[LIFE].id=++MAX_ELEMENT;
   particles[LIFE].rx=0.0;
   particles[LIFE].ry=0.0;
   particles[LIFE].rz=0.0;
   ...
   ++LIFE;
 }

`Particle is a structure of floats.The array Particle particles[0:GL], integer variables LIFE and MAX_ELEMENT are statically allocated on the device. That is why i do not want to use #pragma acc update host/device before/after calling the push_particle() function and lose time for copying data. How can i launch this sequential code on the GPU?

1

1 Answers

1
votes

The OpenACC 2.6 standard which was just ratified includes a "serial" region but it will be a bit before this support is added to the various compiler implementations.

The current method is to use a "parallel" region and set "num_gangs(1)" and "vector_length(1)".

Something like:

push_particle()
 {
#pragma acc parallel num_gangs(1) vector_length(1) present(particles)
{
   particles[LIFE].id=++MAX_ELEMENT;
   particles[LIFE].rx=0.0;
   particles[LIFE].ry=0.0;
   particles[LIFE].rz=0.0;
   ...
   ++LIFE;
}

}