Please, help. 1) I need to use memcpy for moving the arrays allocated on the gpu. i can not use std::memcpy because it "has no acc routine" (compiler output). My code is
const int GL=100000;
Particle particles[GL];
int cp01[2][GL];
#pragma acc declare create(particles,cp01)
...
i read that cudaMemcpy can be used with openacc. In function_device() (not able to fill the array allocated on the gpu) i call from the host
#pragma acc data copy(cp)
{
cudaMemcpy(&particles[cp01[0][0]],&particles[cp01[1][0]],cp*sizeof(Particle),cudaMemcpyDeviceToDevice);
}
i use the header
#include <cuda_runtime.h>
for using CUDA. And build the project as
cmake ../src -DCMAKE_CXX_COMPILER=pgc++ -DCMAKE_CXX_FLAGS="-acc -Minfo=all -Mcuda=llvm"
The program compiles, but does not work, it hangs with no output in the console line. How to move arrays allocated on the device (using cudaMemcpy or in some another manner)? Is that one include enough for using CUDA? Do i build the project correctly (using -Mcuda=llvm is necessary or not)? 2) i also have another question: if one writes
#pragma acc parallel loop
for(int i=0; i<N; ++i)
{...}
the variable N must be allocated on the host only or it may be also on the gpu?