Efficiency of CUDA program/device

Question

I see that some CUDA metrics are totally confusing. According to the definition

sm_efficiency The percentage of time at least one warp is active on a multiprocessor averaged over all multiprocessors on the GPU

warp_execution_efficiency Ratio of the average active threads per warp to the maximum number of threads per warp supported on a multiprocessor expressed as percentage

achieved_occupancy Ratio of the average active warps per active cycle to the maximum number of warps supported on a multiprocessor

I wonder if there is a general relation between these metrics. For example, high occupancy always imply high warp execution efficiency and so on? Otherwise, they are orthogonal and these are cases that high SM efficiency and low occupancy is possible.

The first metric is about time but the others are about number of threads and warps. Can someone clarify that?

einpoklum einpoklum · Accepted Answer · 2019-06-01T21:35:53

The first and third metrics are very closely related and positively correlated. They are both about warps over time, except that the first metric applies a "> 0" operator to the number of warps. Other than that they're the same - but that operator removes the "dimension" of number of warps and just gives you an 1/Time metric instead of Warps/Time metric.

As for the second metric, that has to do with divergence of threads within a warp. While it may correlate somewhat with the third metric, that's not obviously the case. What happens among the threads in a warp is almost orthogonal to what happens to different warps.

Efficiency of CUDA program/device

1 Answers