In CUDA, how to copy an array of device pointers to device memory?

Question

For example, I allocate these following pointers:

float *data_1, *data_2, *data_3, *data_4;

//Use malloc to allocate memory and fill out some data to these pointers
......
//Filling complete

float *data_d1,*data_d2,*data_d3,*data_d4;

cudaMalloc((void **)&data_d1,size1);
cudaMalloc((void **)&data_d2,size2);
cudaMalloc((void **)&data_d3,size3);
cudaMalloc((void **)&data_d4,size4);

cudaMemcpy(data_d1,data_1,size1,cudaMemcpyHostToDevice);
cudaMemcpy(data_d2,data_2,size2,cudaMemcpyHostToDevice);
cudaMemcpy(data_d3,data_3,size3,cudaMemcpyHostToDevice);
cudaMemcpy(data_d4,data_4,size4,cudaMemcpyHostToDevice);

After this, I should already get 4 device pointers containing the exact data as host pointers do. Now I'd like to store these pointers into one array of pointers as following,

float *ptrs[4];

ptrs[0] = data_d1;
ptrs[1] = data_d2;
ptrs[2] = data_d3;
ptrs[3] = data_d4;

Now I'd like to transfer this array of pointers to CUDA kernel. However, I know that since ptrs[4] is actually on host memory, I need to allocate a new pointer on device. So I did this,

float **ptrs_d;
size_t size = 4 * sizeof(float*);
cudaMalloc((void ***)&ptrs_d,size);
cudaMemcpy(ptrs_d,ptrs,size,cudaMemcpyHostToDevice);

And then invoke the kernel:

kernel_test<<<dimGrid,dimBlock>>>(ptrs_d, ...);
//Declaration should be 
//__global__ void kernel_test(float **ptrs_d, ...);

In the kernel_test, load data in the following syntax:

if (threadIdx.x < length_of_data_1d)
{
    float element0 = (ptrs[0])[threadIdx.x];
}

Compiling is OKay, but when debugging, it gives an error of access violation.

Perhaps there're a lot of errors in my code. But I just want to figure out why I can't pass device pointers in this way and what is the proper way to access it if it is allowed in CUDA to pass array of device pointers to kernel function.

So how should I fix this issue? Any suggestions are appreciated. Thanks in advance.

I don't see any obvious problems with your method. I built a simple code around what you have shown, and it seems to work correct for me, it is here. Your access violation may simply be an array out of bounds based on the length of your data, and some code you haven't shown here. It might not have anything to do with your basic method for copying an array of device pointers. I suggest you provide a complete code that reproduces the problem, rather than a sequence of snippets. The problem is in something you haven't shown here. — Robert Crovella
Thank you very much for your suggestion. I've tried debugging a few times and finally learnt that this method was actually applicable. The real problem seems most likely to be the out of bounds issue as you have suggested and now I'm trying to figure out. Thanks again for your responsive help. — Coding_new_bird
Certainly running with the debugger should allow you to get an idea of what is going wrong. Running your code with cuda-memcheck may also shed light on the issue. — Robert Crovella

hubs hubs · Accepted Answer · 2013-12-10T14:51:40

One possibility is to allocate a void pointer, like CUDA expects as as standart, too. When passing it into your kernel, you can cast it to float**. I did it in that way:

void* ptrs_d = 0;
cudaMalloc(&ptrs_d, 4*sizeof(float*));
cudaMemcpy(ptrs_d, ptrs, 4*sizeof(float*), cudaMemcpyHostToDevice);
kernel_test<<<dimGrid, dimBlock>>>((float**)ptrs_d);

In CUDA, how to copy an array of device pointers to device memory?

1 Answers