How can the number of instructions executed be greater than the number of instructions issued?

Question

If, as several online resources, including this one, have it, the number of instructions executed + number of replays = the number of instructions issued, and if the number of replays is positive, how can a CUDA kernel have the following properties (from nvprof)?

Invocations       Avg       Min       Max       Event Name
1                 69161760  69161760  69161760  inst_executed
1                 37263115  37263115  37263115  inst_issued1
1                 19130919  19130919  19130919  inst_issued2

(inst_issued = inst_issued1 + inst_issued2 = 37263115 + 19130919; ratio = inst_executed/inst_issued > 1).

Is

inst_issued = inst_issued1 + inst_issued2

the correct formula for total number of instructions issued? Are there kernel-issued instructions other than *issued1 and *issued2? If so, how can they be profiled?

Online, I'm not seeing any obvious answers to my questions. For instance, my version of nvprof --query-events only yields the above three parameters as possible arguments to --events. There also seems to be no mention of this in the CUDA programming documentation, the link above, or any of the other ten or so links I've read up on that relate to CUDA instruction optimization.

Additional information:

0) I'm running CUDA 5.0, and compiling with nvcc -m64 -arch=sm_30.

1) I'm running a math-only version of my kernel, and since it has no register pressure, the number of global memory accesses are negligible.

2) I do not have access to the nVidia visual profiler, so I'm not sure if it will give me answers different from those above.

Thanks a lot, and apologies in advance if this is silly.

Greg Smith Greg Smith · Accepted Answer · 2013-03-11T13:13:55

inst_issue2 is the number of issue slots that 2 instructions were issued.

inst_issued1: Number of single instruction issued per cycle

inst_issued2: Number of dual instructions issued per cycle

The formula for total instructions issued is:

inst_issued = (inst_issued2 * 2) + inst_issued1

Using the numbers in the questions gives:

inst_issued = (inst_issued2 * 2) + inst_issued1
            = (19130919 * 2) + 37263115
            = 75524953
ratio = inst_executed / inst_issued
      = 69161760 / 75524953
      = .916

How can the number of instructions executed be greater than the number of instructions issued?

1 Answers