CUDA shared memory cannot be allocated

Question

I'm writing CUDA kernel code which uses shared memory, but have trouble to declare shared memory variables.

It happens when I try to allocate multiple shared memory statically as follows.

__global__
void kernel_func(float *global_matrix) {
    __shared__ float sm_mat1[4][4];
    __shared__ float sm_mat2[6][6];
    __shared__ float sm_mat3[3][3][3];

    if ( blockIdx.x==0 && blockIdx.y==0 && theradIdx.x==0 && threadIdx.y==0 )
        printf("sizeof(sm_mat1)=%d, sizeof(sm_mat2)=%d, sizeof(sm_mat3)=%d.\n",
                    sizeof(sm_mat1), sizeof(sm_mat2), sizeof(sm_mat3));

    ...
}

However, when I execute, it output weird message as follows. sizeof(sm_mat1)=64, sizeof(sm_mat2)=0, sizeof(sm_mat3)=128

It seems 2nd matrix is not allocated, and 3rd matrix is allocated as 2nd. Actually, accessing 2nd matrix does not work correctly. (cannot read/write data).

I'm using GTX 480, and cuda2.0. (I'm printing message using compile option -arch=sm_20).

Does anyone have any thought?

SteelRaven SteelRaven · Accepted Answer · 2014-02-14T06:21:06

Operator sizeof returns not int but std::size_t. So when you send its result to printf on systems where sizeof(size_t) == 8 and sizeof(int) == 4 and try to print it with %d one result of sizeof will be split into two halves that will be printed by consecutive %d specifiers. Zero placed instead of the second specifier is the top half of the first sizeof operator result.

To correct the output value, you can explicitly cast sizeof result to int or try %ld or %lld specifiers.

But i also can't reproduce the bug with incorrect second array size.

CUDA shared memory cannot be allocated

1 Answers