Temporary CUDA Device Arrays

Question

Having been playing around with this grand CUDA experiment for a few months now, I find myself experimenting more and trying to pull away from the tutorial examples.

My question is this : If I want to just use arrays on the GPU for something like temporary storage without copying them back to the host for display/output, can I just create a device array with __device__ double array[numpoints]; Then for anything I want to take back from the GPU, I need to do the whole cudaMalloc, cudaMemcpy spiel, right? Additionally, is there any difference between one method or another? I thought they both create arrays in global memory.

Sagar Masuti Sagar Masuti · Accepted Answer · 2013-11-22T03:56:36

See this discription about the __device__ qualifier. So if you declare it __device__ you cannot access it in the host through cudaMemcpy but there are other mentioned in the link.

Instead what you can do is declare a global pointer(ie., without __device__) in host code and allocate using the cudaMalloc. So you can use the same to copy the result back to host using the cudaMemcpy.

Temporary CUDA Device Arrays

2 Answers