I want to design a code in which the same device memory should be accessed from kernels in multiple cuda files. A simplified example is given below in which main.c calls 3 .cu files: cuda_malloc.cu, cuda_print.cu and cuda_free.cu.
Main.c file: declares a pointer "d_array"
main()
{
int maxpar = 10;
float* d_array;
cuda_malloc(maxpar, d_array);
cuda_print(maxpar,d_array);
cuda_free(d_array);
}
cuda_malloc.cu file: allocates device memory for d_array and sets values to zero.
extern "C" void cuda_malloc(int maxpar, float* d_array)
{
CUDA_SAFE_CALL(cudaMalloc((void**)&d_array,sizeof(float)*maxpar));
CUDA_SAFE_CALL(cudaMemset(d_array,'\0',sizeof(float)*maxpar));
}
cuda_print.cu file: calls "kernel" to print "d_array" from the device memory
extern "C"
{
__global__ void kernel(int maxpar, float* d_array)
{
int tid = threadIdx.x;
if (tid >= maxpar) return;
printf("tId = %d, d_array[i] = %f \n",tid,d_array[tid]);
}
void cuda_print(int maxpar, float* d_array)
{
//If I un-comment the following 2 lines, the kernel function prints array values
//otherwise, it does not
//CUDA_SAFE_CALL(cudaMalloc((void**)&d_array,sizeof(float)*maxpar));
//CUDA_SAFE_CALL(cudaMemset(d_array,'\0',sizeof(float)*maxpar));
kernel <<<1, maxpar>>> (maxpar,d_array);
cudaDeviceSynchronize();
cudaGetLastError();
}
cuda_free.cu file: frees the device memory
extern "C" void cuda_free(float* d_array)
{
CUDA_SAFE_CALL(cudaFree(d_array));
}
This code compiles fine. Notice that I am trying to print "d_array" in the "kernel" function called from the "cuda_print.cu" file. However, it does not print it. There is no error as well. If in "cuda-print.cu" file, I again allocate device memory and memset it to zero, then kernel prints it.
My question is: how can I access the same device memory from multiple cuda files?
Thanks
cuda_mallocfunction is incorrect. You must pass the pointer you allocate by reference, not by value to that function. This isn't really anything to do with CUDA, it is understanding how pointers work in C. - talonmies