Computer Organization - How does "Predict taken"(always taken) branch prediction work?

Question

I can understand how "predict untaken" work. It just move on fetching PC+4 instruction. Until the branch is resolved, if the branch is taken, then flushes all the instructions fetched before.

But I don't understand how does "predict taken" work. I think the branch instruction needs to be at decode stage(and the branch target address calculation need to be completed) before the processor can predict that it will be taken, right?

Then how does the "predict taken" be implemented on machine like MIPS 5-stage pipeline? (branch target address calculation and the branch is taken or not is decided at ID(instruction decode) stage)

If the branch can be resolved at ID stage, is it means prediction is done at IF(instruction fetch) stage?

I'm get confused because someone said "predict taken" or "predict untaken" are called "static branch prediction", compiler will do all the things. So in the "predict taken" case, compiler will insert the branch target instruction into the position after branch instruction.

Is my thought correct? or his phrase is correct?

Peter Cordes Peter Cordes · Accepted Answer · 2019-09-17T06:06:14

MIPS has branch-delay slots that hide branch latency for a simple 5-stage pipeline trivially for unconditional branches (detected in ID, the stage after fetch), and even for conditional branches by evaluating them in the first half of EX, in time to forward to 2nd half of IF. (MIPS I R2000 did that).

But yes, completely avoiding fetch bubbles requires predicting the existence of branches before they're decoded, along with their target addresses. (Including for unconditional direct branches). Real predictors do that. See Slow jmp-instruction for an example on modern x86.

But that's very far from classic 5-stage RISC.

If you were putting such a dynamic predictor into a 5-stage RISC without branch-delay slots, e.g. a simple RISC-V, you'd maybe have it actually check ahead of where fetch is currently fetching, so you have a prediction for what to fetch in the next cycle.

You'd only use static always-taken prediction for conditional branches. (And usually only with a backwards displacement because those are often loop branches; predicting forward branches to be not-taken works well in practice, especially when compilers / programmers lay out their code accordingly so the common case for if()-type branches is not-taken). By the time you can detect that there's a branch at all, you already know if it's unconditional and don't need any prediction in that case.

If you don't already use tricks like MIPS I early eval of branch conditions, your branch latency would be 2 cycles (IF to EX) for conditional branches. Static always-taken prediction would shorten that to 1 cycle (IF to ID). Not 0, as you say, because the not-taken path is still being fetched while the branch instruction itself is being decoded.

i.e. you could design the ID stage to resteer fetch for next cycle when it sees a conditional branch. (Possibly after checking the displacement for forwards / backwards, i.e. just the high bit of a 2's complement value.)

So you optimize for fall-through of forward branches and looping backward branches because those are relatively common. To do even better you'd use a cache of dynamic predictions that you index by address, or in various complex ways (e.g. TAGE uses recent branch history as part of the index, and see https://danluu.com/branch-prediction/ for historical progress from very simple to less simple predictors).

Computer Organization - How does "Predict taken"(always taken) branch prediction work?

1 Answers