1
votes

At the moment I try to get a better occupancy for my kernel and use the occupancy calculator and the device informations that I get from the sdk sample devicequery. I'm wondering of a slightly different declaraion of blocks and streaming multiprocessor (sm). In the sdk sample it's called

total amount of shared memory per block

and

total number of registers available per block

But in the occupancy calculator these informations are per sm, which makes more sense to me.

Is that only a wrong declaration in the sdk sample?

1

1 Answers

0
votes

I agree with you.

Shared mem and registers are hardware resources, but block is a concept of the software programming model.

On the other hand, I think we can say that the max amount of the shared mem a block can use is equal to the total amount of the shared mem per multiprocessor of a GPU device.

The official doc CUDA programming guide also uses the term "shared memory per multiprocessor " as shown in the section Compute Capabilities;