What CUDA shared memory size means

Question

I am trying to solve this problem myself but I can't. So I want to get yours advice.

I am writing kernel code like this. VGA is GTX 580.

xxxx <<< blockNum, threadNum, SharedSize >>> (... threadNum ...)
(note. SharedSize is set 2*threadNum)

__global__ void xxxx(..., int threadNum, ...)
{
    extern __shared__ int shared[];
    int* sub_arr = &shared[0];
    int* sub_numCounting = &shared[threadNum];
    ...
}

My program creates about 1085 blocks and 1024 threads per block.

(I am trying to handle huge size of array)

So size of shared memory per block is 8192(1024*2*4)bytes, right?

I figure out I can use maximum 49152bytes in shared memory per block on GTX 580 by using cudaDeviceProp.

And I know GTX 580 has 16 processors, thread block can be implemented on processor.

But my program occurs error.(8192bytes < 49152bytes)

I use "printf" in kernel to see whether well operates or not but several blocks not operates. (Although I create 1085blocks, actually only 50~100 blocks operates.)

And I want to know whether blocks which operated on same processor share same shared memory address or not. ( If not, allocates other memory for shared memory? )

I can't certainly understand what maximum size of shared memory per block means.

Give me advice.

Hopefully the shared memory size is 2*threadNum*sizeof(int), otherwise your problem isn't asking for too much shared memory, it is too little. — talonmies

chaohuang chaohuang · Accepted Answer · 2012-07-16T15:18:02

Yes, blocks on the same multiprocessor shared the same amount of shared memory, which is 48KB per multiprocessor for your GPU card (compute capability 2.0). So if you have N blocks on the same multiprocessor, the maximum size of shared memory per block is (48/N) KB.

What CUDA shared memory size means

1 Answers