At the moment I try to get a better occupancy for my kernel and use the occupancy calculator and the device informations that I get from the sdk sample devicequery. I'm wondering of a slightly different declaraion of blocks and streaming multiprocessor (sm). In the sdk sample it's called
total amount of shared memory per block
and
total number of registers available per block
But in the occupancy calculator these informations are per sm, which makes more sense to me.
Is that only a wrong declaration in the sdk sample?