I'm writing CUDA kernel code which uses shared memory, but have trouble to declare shared memory variables.
It happens when I try to allocate multiple shared memory statically as follows.
__global__
void kernel_func(float *global_matrix) {
__shared__ float sm_mat1[4][4];
__shared__ float sm_mat2[6][6];
__shared__ float sm_mat3[3][3][3];
if ( blockIdx.x==0 && blockIdx.y==0 && theradIdx.x==0 && threadIdx.y==0 )
printf("sizeof(sm_mat1)=%d, sizeof(sm_mat2)=%d, sizeof(sm_mat3)=%d.\n",
sizeof(sm_mat1), sizeof(sm_mat2), sizeof(sm_mat3));
...
}
However, when I execute, it output weird message as follows. sizeof(sm_mat1)=64, sizeof(sm_mat2)=0, sizeof(sm_mat3)=128
It seems 2nd matrix is not allocated, and 3rd matrix is allocated as 2nd. Actually, accessing 2nd matrix does not work correctly. (cannot read/write data).
I'm using GTX 480, and cuda2.0. (I'm printing message using compile option -arch=sm_20).
Does anyone have any thought?