What is the real amount of shared memory for block on sm13?

Question

Maximum number of resident blocks per multiprocessor 8
Maximum amount of shared memory per multiprocessor 16 KB

Does it mean, if I have a lot of running blocks, every of them can have only 2 KB of shared memory? If it isn't so and every block still have 16KB shared memory, there is it stored, when 2 blocks with 16KB memory are executing on signle MP?

Please stop. Asking 8 questions in 24 hours, most of which could be answered by reading a handful of pages from the CUDA programming guide, is verging on abuse of Stack Overflow. — talonmies
Looks like Nvidia had removed everything related to cc<3.0 from the docs! Maybe @Nexen did a good thing after all :) — ogurets

Robert Crovella Robert Crovella · Accepted Answer · 2013-12-22T15:34:56

All of the blocks running on a multiprocessor must share all resources (registers, shared memory, etc.)

If your threadblock uses shared memory, the first rule it must satisfy is that it cannot use more than what is available in the SM (i.e. 16KB in this case).

If the threadblock requires less than 16KB, then it may be possible to have multiple threadblocks executing on the SM. For example, two threadblocks could be executing if each only uses approximately 8KB. Four threadblocks could be executing if each only used at most (slightly less than) 4KB (there is some overhead, usually).

If you wanted the maximum of 8 threadblocks to be able to execute at once on a given SM (multiprocessor), then you would have to ensure in your code that the threadblock uses no more than 2KB of shared memory (probably a little less than 2KB).

If each threadblock used 16KB of shared memory, it simply means that additional threadblocks will wait in a queue until that threadblock is finished on that SM, before they begin to execute.

If a threadblock attempted to use more than 16KB (in this case) you would get a kernel launch error.

What is the real amount of shared memory for block on sm13?

1 Answers