0
votes

Occupancy in CUDA is defined as

occupancy = active_warps / maximum_active_warps

What is the difference between a resident CUDA warp and an active one?

From my research on the web it seems that a block is resident (i.e. allocated along with its register/shared memory files) on a SM for the entire duration of its execution. Is there a difference with "being active"?

If I have a kernel which uses very few registers and shared memory.. does it mean that I can have maximum_active_warps resident blocks and achieve 100% occupancy since occupancy just depends on the amount of register/shared memory used?

1
Might be related to this question. - Taro

1 Answers

2
votes

What is the difference between a resident CUDA warp and an active one?

In this context presumably nothing.

From my research on the web it seems that a block is resident (i.e. allocated along with its register/shared memory files) on a SM for the entire duration of its execution. Is there a difference with "being active"?

Now you have switched from asking about warps to asking about blocks. But again, in this context no, you could consider them to be the same.

If I have a kernel which uses very few registers and shared memory.. does it mean that I can have maximum_active_warps resident blocks and achieve 100% occupancy since occupancy just depends on the amount of register/shared memory used?

No because a warp and a block are not the same thing. As you yourself have quoted, occupancy is defined in terms of warps, not blocks. The maximum number of warps is fixed at 48 or 64 depending on your hardware. The maximum number of blocks is fixed at 8, 16 or 32 depending on hardware. There are two independent limits which are not the same. Both can influence the effective occupancy a given kernel can achieve.