3
votes

It's my understanding that at the beginning of a processor's pipeline, the instruction pointer (which points to the address of the next instruction to execute) is updated by the branch predictor after fetching, so that this new address can then be fetched on the next cycle.

However, if the instruction pointer is modified early on in the pipeline, wouldn't this affect instructions currently in the execute phase that might rely on the old instruction pointer value? For instance, when doing a call the current EIP needs to be pushed into the stack, but wouldn't this be affected when the instruction pointer is updated during branch prediction?

1
in many pipelined architectures the program counter is bogus, the one the software can see has the right value. there are some to many other instruction pointer addresses used by the logic that do the real heavy lifting, one or more branch prediction computations, the actual pointer that goes to fetch memory, etc. Arm is a simple one the program counter being two instructions ahead has not been that way for a long while, the pipes are deeper with prediction. yet we still have an r15 that gives the as designed in the instruction set result.old_timer
a usable (pseudo) register like EIP would have the correct value for the instruction set being used, independent of any latched or combinational addresses used for actual fetching.old_timer

1 Answers

9
votes

You seem to be assuming that there's only one physical EIP register that's used by the whole CPU core.

That doesn't work because every instruction that could take an exception needs to know its own address. Or when an external interrupt arrives, the CPU could decide to service the interrupt after any instruction, making that one the architectural EIP. In long mode (x86-64), there are also RIP-relative addressing modes, so call isn't the only instruction that needs the current program-counter as data.

A simple pipelined CPU might have an EIP for each pipeline stage.

A modern superscalar out-of-order x86 associates an EIP (or RIP) with each in-flight instruction (or maybe each uop; but multi-uop instructions have all their uops associated with each other so an instruction can't partially retire.)

Unlike other parts of the architectural state (e.g. EFLAGS, EAX, etc.) the value is statically known after decode. Actually even earlier than immediate values; instruction boundaries are detected in a pre-decode stage (or marked in L1i cache) so that multiple instructions can be fed to multiple decoders in parallel.

The early fetch/decode stage might just track addresses of 16-byte or 32-byte fetch blocks, but after decode I assume there's an address field in the internal uop representation. It might just be a small offset from the previous (to save space) for non-branch instructions, so if it's ever needed it can be calculated, but we're deep into implementation details here. Out-of-order execution maintains the illusion of instructions running in program-order, and they do issue and retire in-order (enter/leave the out-of-order execution part of the core).

Related: x86 registers: MBR/MDR and instruction registers makes a similar wrong assumption based on looking at toy CPUs. There is no "current instruction" register holding the machine code bytes either. See more links in my answer there for more about OoO / pipelined CPUs.


Branch prediction has to work before a block is even decoded. i.e. given that we just fetched a block at address abc, we need to predict what block to fetch next. i.e. prediction has to predict the existence of jumps in a 16-byte block of instructions that will be decoded in parallel.

Related: Why did Intel change the static branch prediction mechanism over these years?