How far does the branch prediction go?

Question

I understand there is a branch predictor in modern CPU designs trying to guess which branch to go.

Assuming there is a jump instruction that will transfer control flow to either basic block A or basic block B.

If the predictor decides to go to A, when the actual calculation comes to the jump instruction, and finds out B should be the right choice instead of A, at this time, how far does the execution in basic block A go?

Are all the instructions in basic block A are done executed? Or just the first instruction is executed?

How can we find out the actual result and know more about the branch prediction strategies?

Peter Cordes Peter Cordes · Accepted Answer · 2019-08-06T04:52:14

The CPU assumes branch prediction was correct and continues unless/until it discovers it wasn't. (HW can't detect "basic blocks": it doesn't know when it reaches an address that some other instruction branches to. And you wouldn't want to stop anyway. Modern branch prediction good enough to be usable in an out-of-order CPU is usually correct like 95 to 99% of the time.)

Discovering a mispredict (or confirming a correct prediction) happens when the branch instruction itself is decoded (unconditional direct branch) or executed (conditional and/or indirect).

In case of a mispredict, the CPU has to re-steer the front-end (fetch/decode) to the correct path. On an in-order CPU, no instructions after a branch can execute until the branch itself executes, so it's always just a matter of restarting fetch/decode.

(In-order superscalar could actually execute an instruction after a branch, but an in-order pipeline makes it relatively easy to squash before it reaches write-back and actually changes architectural state. A store is probably the trickiest because you need to discard that store-buffer entry; its visible effect would be on memory, not write-back to registers. But anyway, that along with decoupling execution from cache-miss stores, and other reasons, is why even in-order pipelined CPUs have store buffers.)

Or for an out-of-order CPU with speculative execution that allows instruction from the wrong path to actually execute while a conditional or indirect branch is waiting for its input, it has to flush the back-end and restart issue from the correct path of execution.

(With fast-recovery and a branch-order-buffer, this can happen even if some of the instructions before the branch haven't finished executing yet. e.g. in a loop with a simple loop condition and a longer dependency chain in the loop body, so execution of the loop-condition dependency chain can run ahead and discover a mispredict in the last iteration when the loop branch falls through, without waiting until that instruction is ready to retire. i.e. without waiting for the loop body to execute that far.)

Multiple branches can be in flight at once. And of course any load or store can fault. An OoO exec CPU basically treats everything as speculative until retirement.

How can we ... know more about the branch prediction strategies?

https://danluu.com/branch-prediction/ is pretty good. See also the first chapter of Agner Fog's microarch guide (for x86) where he covers what real Intel and AMD CPUs do, as well as some background. https://agner.org/optimize/

How far does the branch prediction go?

1 Answers