For example, I allocate these following pointers:
float *data_1, *data_2, *data_3, *data_4;
//Use malloc to allocate memory and fill out some data to these pointers
......
//Filling complete
float *data_d1,*data_d2,*data_d3,*data_d4;
cudaMalloc((void **)&data_d1,size1);
cudaMalloc((void **)&data_d2,size2);
cudaMalloc((void **)&data_d3,size3);
cudaMalloc((void **)&data_d4,size4);
cudaMemcpy(data_d1,data_1,size1,cudaMemcpyHostToDevice);
cudaMemcpy(data_d2,data_2,size2,cudaMemcpyHostToDevice);
cudaMemcpy(data_d3,data_3,size3,cudaMemcpyHostToDevice);
cudaMemcpy(data_d4,data_4,size4,cudaMemcpyHostToDevice);
After this, I should already get 4 device pointers containing the exact data as host pointers do. Now I'd like to store these pointers into one array of pointers as following,
float *ptrs[4];
ptrs[0] = data_d1;
ptrs[1] = data_d2;
ptrs[2] = data_d3;
ptrs[3] = data_d4;
Now I'd like to transfer this array of pointers to CUDA kernel. However, I know that since ptrs[4] is actually on host memory, I need to allocate a new pointer on device. So I did this,
float **ptrs_d;
size_t size = 4 * sizeof(float*);
cudaMalloc((void ***)&ptrs_d,size);
cudaMemcpy(ptrs_d,ptrs,size,cudaMemcpyHostToDevice);
And then invoke the kernel:
kernel_test<<<dimGrid,dimBlock>>>(ptrs_d, ...);
//Declaration should be
//__global__ void kernel_test(float **ptrs_d, ...);
In the kernel_test, load data in the following syntax:
if (threadIdx.x < length_of_data_1d)
{
float element0 = (ptrs[0])[threadIdx.x];
}
Compiling is OKay, but when debugging, it gives an error of access violation.
Perhaps there're a lot of errors in my code. But I just want to figure out why I can't pass device pointers in this way and what is the proper way to access it if it is allowed in CUDA to pass array of device pointers to kernel function.
So how should I fix this issue? Any suggestions are appreciated. Thanks in advance.
cuda-memcheck
may also shed light on the issue. – Robert Crovella