Passing a pointer to device memory between classes in CUDA

Question

I would appreciate some help involving CUDA device memory pointers. Basically I want to split my CUDA kernel code into multiple files for readability and because it is a large program. So what I want to do is be able to pass the same device memory pointers to multiple CUDA kernels, not simultaneously. Below is a rough example of what I need

//random.h
class random{
public:
    int* dev_pointer_numbers;
};

so the object simply needs to store the pointer to device memory

//random_kernel.cu
__global__ void doSomething(int *values){
//do some processing}

extern "C" init_memory(int *devPtr,int *host_memory,int arraysize)
{
    cudaMalloc(&devPtr,arraysize*sizeof(int));
    cudaMemcpy(devPtr,host_memory,arraysize*sizeof(int),cudaMemcpyHostToDevice);
}

extern "C" runKernel(int *devPtr){
    doSomething<<<1,1>>>(devPtr);
}

and the main file:

//main.cpp
//ignoring all the details etc
random rnd;
void CUDA(int *hostArray)
{
    init_memory(rnd.dev_pointer_numbers,hostArray,10);
    runKernel(rnd.dev_pointer_numbers);
}

I understand that when I run the kernel code with the object pointer it isnt mapped in device memory so thats why the kernel code fails. What I want to know is how can I store to the pointer to a particular block in device memory in my main file so that it can be reused amongst other cuda kernel files?

Tom Tom · Accepted Answer · 2012-09-11T12:39:45

You're losing your pointer!

Check out your init_memory function:

init_memory(int *devPtr,int *host_memory,int arraysize)
{
  cudaMalloc(&devPtr,arraysize*sizeof(int));
  cudaMemcpy(devPtr,host_memory,arraysize*sizeof(int),cudaMemcpyHostToDevice);
}

So you pass in a pointer, at which point you have a local copy named devPtr. Then you call cudaMalloc() with the address of the local copy of the pointer. When the function returns the local copy (on the stack) is destroyed, so you have lost the pointer.

Instead try this:

init_memory(int **devPtr,int *host_memory,int arraysize)
{
  cudaMalloc(devPtr,arraysize*sizeof(int));
  cudaMemcpy(*devPtr,host_memory,arraysize*sizeof(int),cudaMemcpyHostToDevice);
}

...

init_memory(&rnd.dev_pointer_numbers,hostArray,10);

As a side note, consider removing the extern "C", since you're calling this from C++ (main.cpp) there's no point and it just clutters your code.

Passing a pointer to device memory between classes in CUDA

1 Answers