I see that some CUDA metrics are totally confusing. According to the definition
sm_efficiency The percentage of time at least one warp is active on a multiprocessor averaged over all multiprocessors on the GPU
warp_execution_efficiency Ratio of the average active threads per warp to the maximum number of threads per warp supported on a multiprocessor expressed as percentage
achieved_occupancy Ratio of the average active warps per active cycle to the maximum number of warps supported on a multiprocessor
I wonder if there is a general relation between these metrics. For example, high occupancy always imply high warp execution efficiency and so on? Otherwise, they are orthogonal and these are cases that high SM efficiency and low occupancy is possible.
The first metric is about time but the others are about number of threads and warps. Can someone clarify that?