I'm relatively new to CUDA programming, so I want to clarify the behaviour of a struct when I pass it into a kernel. I've defined the following struct to somewhat imitate the behavior of a 3D array that knows its own size:
struct protoarray {
size_t dim1;
size_t dim2;
size_t dim3;
float* data;
};
I create two variables of type protoarray, dynamically allocate space to data via malloc and cudaMalloc on the host and device side, and update dim1, dim2 and dim3 to reflect the size of array I want this struct to represent. I read in this thread that the struct should be passed via copy. So this is what I do in my kernel
__global__ void kernel(curandState_t *state, protoarray arr_device){
const size_t dim1 = arr_device.dim1;
const size_t dim2 = arr_device.dim2;
for(size_t j(0); j < dim2; j++){
for(size_t i(0); i < dim1; i++){
// Do something
}
}
}
The struct is passed by copy, so all its contents are copied into shared memory of each block. This is where I'm getting bizarre behaviour, which I'm hoping you could help me with. Suppose I had set arr_device.dim1 = 2 on the host side. While debugging inside the kernel and setting a breakpoint at one of the for loops, checking the value of arr_device.dim1 yields something like 16776576, nowhere large enough to cause overflow, but this value copies correctly into dim1 as 2, which means that the for loops execute as I intended them to. As a side question, is using size_t which is essential unsigned long long int bad practice, seeing as the GPU's are made of 32bit cores?
Generally, how safe is it to pass struct and class into kernels as arguments, is bad practice that should be avoided at all cost? I imagine that passing pointers to classes to kernels is difficult in case they contain members which point to dynamically allocated memory, and that they should be very lightweight if I want to pass them by value.
size_tis the same size on both the CPU and GPU? (Andstructs without constructors are generally safe, I believe.) - Ken Y-Nsizof(size_t)andsizeof(void *)are the same on the GPU and device for whatever platform is being used. It might not be the same from platform to platform - talonmiesunsigned int, and now the values match. Could you please refer me to some source which refers to the issue ofstruct's with constructors in kernels? @talonmies ""The struct is passed by copy, so all its contents are copied into shared memory of each block" this is essentially a citation from the NVIDIA developer forum thread I linked. - konovification