According to the profiler user guide:
flop_count_sp: Number of single-precision floating-point operations executed by non-predicated threads (add, multiply and multiply-accumulate). Each multiply-accumulate operation contributes 2 to the count. The count does not include special operations.
inst_fp_32: Number of single-precision floating-point instructions executed by non-predicated threads (arithmetic, compare, etc.)
I have a kernel with the profiler output can be added up to something like:
flop_count_sp = flop_count_sp_add + flop_count_sp_mul + 2 * flop_count_sp_fma
inst_fp_32 = flop_count_sp_add + flop_count_sp_mul + flop_count_sp_fma
Given the numbers in these metric, I am wondering what is an operation and what is an instruction here? It seems like a fma
is one instruction, but two operations. Whereas add
and mul
is one instruction and one operation. Since SASS assembly is counted by the profiler. Are there any instructions that are not counted as operations? or vice versa. I only want to know in the context of nvprof and nvvp metrics.
Also, when we talk about peak performance in TFLOP/s, the OP
here corresponds to Operations i guess? If I want to estimate something like compute to global memory access (CGMA), should I use flop_count_sp
instead of the inst_fp_32
for the compute part? Thanks in advance.