0
votes

Imagine you have two instructions in assembly:

movl $10, %ecx
movl 0(%eax), %edx

The CPI for movements is 1, and for acess to memory is 2.

For the 1st line CPI = 1. For the second one, is the CPI= 2 or 3? Do we sum the acess to the memory (2 cycles) + the move cost, or just consider the acess to memory?

1

1 Answers

5
votes

Cycle counting doesn't really work anymore, ever since the Pentium 4 hit the market. Deep pipelines, three-level memory cache hierarchies, multiple execution units with out-of-order execution, branch prediction...

It is often possible to make a good guess about the timing of a bigger piece of code but for two isolated instructions it is virtually impossible (unless one instruction happens to be DIV or IDIV, then we know it must be bad). The context is important because dependency chains play a big role (critical path).

In real code, your two instructions might well contribute nothing at all to the total timing, if they execute in the latency shadow of some other instruction. On the other hand, if the value addressed by EAX is not in any of the caches then it costs you hundreds of cycles, or many thousands if the data has to be paged in from disk...

The current Intel® 64 and IA-32 Architectures Optimization Reference Manual contains everything that you need. It contains tables with cycle counts (latency and throughput) for most instructions, as well as several hundred pages of explanation why simple cycle counting doesn't work.