6
votes

The occupancy is defined to be the number of active warps over the number of max warps supported on one Stream Multiprocessor. Let us say I have 4 blocks running on one SM, each block has 320 threads, i.e., 10 warps, so 40 warps on one SM. The Occupancy is 40/48, assuming max warps on one SM is 48 (CC 2.x).

But in total I have 320 * 4 threads running on one SM, and there are only 48 CUDA cores on one SM. Why the occupancy is not 100%? I am using all CUDA cores...

I am pretty sure I am missing something...

1
The meachine needs more than one thread per core in order to hide latency and run at full speed. That is why there is a possibility to have multiple warps open simultaneously on one SM. Occupancy includes the warp executing plus all warps that are ready to execute. By the way many CC 2.x SMs have 32 cores, not 48. You may want to read this description of the hardware multithreading architecture carefully.Robert Crovella
Thanks. I took a look at the programming guide. So the execution of warps can be interleaved on the cores. A single warp does not fully utilize the cores. That is why we define occupancy in this way.szli
@szli: depending on the architecture a single warp (or two in a dual issue architecture) can "full utilize the cores" on a given SM. But it can't do it on every clock cycle. That is why many warps are needed and why occupancy is concerned with the number of warps per SM.talonmies

1 Answers

10
votes

Because occupancy has nothing to do with cores. CUDA is a pipelined SIMD style architecture. Your 48 cores are fed per warp instructions from a pipeline (dual issued, in fact). You need a lot of warps to keep the instruction pipeline full, otherwise all the cores will stall. That is why occupancy is a somewhat useful metric for quantifying the ability of a given kernel to supply enough parallel work to achieve reasonable performance.