error when copying dynamically allocated data in device to host?

Question

I recently meet a problem when copying dynamically allocated data in device to host memory. The data is allocated with malloc, and I copy those data from device to host in host function. Here is the code:

#include <cuda.h> 
#include <stdio.h> 

#define N 100 
__device__ int* d_array; 
__global__ void allocDeviceMemory() 
{ 
d_array = new int[N]; 
for(int i=0; i < N; i++) 
d_array[i] = 123; 
} 
int main() 
{ 
allocDeviceMemory<<<1, 1>>>(); 
cudaDeviceSynchronize(); 
int* d_a = NULL; 
cudaMemcpyFromSymbol((void**)&d_a, "d_array", sizeof(d_a), 0, cudaMemcpyDeviceToHost); 
printf("gpu adress: %p\n", d_a); 

int* h_array = (int*)malloc(N*sizeof(int)); 
cudaError_t errr = cudaMemcpy(h_array, d_a, N*sizeof(int), cudaMemcpyDeviceToHost); 
printf("h_array: %d, %d\n", h_array[0], errr); 

getchar(); 
return 0; 
}

There is already a poster had the same issue for CUDA 4.1, and some experts suggest upgreading the CUDA driver and runtime to newer version can solve this issue. CUDA - Copy device data to host?

I have CUDA toolkit 4.2 and lastest developer drivers and C2075, but it still come up with the above problem. Please let me know how to solve this problem.

talonmies talonmies · Accepted Answer · 2012-06-26T04:14:03

Unfortunately there is no way to do what you are trying to do it CUDA 4. The host API cannot copy from dynamically allocated addresses on device runtime heap, only device code can access them. If you want to copy with the host API, you will need to write the data into an "output" buffer allocated with the host API first, then you are free to use cudaMemcpy to retrieve it from the host.

You can see confirmation of this limitation from Mark Harris of Nvidia here.

Since this answer was posted in 2012, the restriction on host API interoperability appears to have been set in stone, and is explicitly documented in the CUDA programming guide.

error when copying dynamically allocated data in device to host?

1 Answers