Branch target prediction in conjunction with branch prediction?

Question

EDIT: My confusion arises because surely by predicting which branch is taken, you are effectively doing the target prediction too??

This question is intrinsically linked to my first question on the topic:

branch prediction vs branch target prediction

Looking at the accepted answer:

Unconditional branch, fixed target

Infinite loop

goto statement

break or continue statement

End of the 'then' clause of an if/else statement (to jump past the else clause)

Non-virtual function call

Unconditional branch, variable target

Returning from a function

Virtual function call

Function pointer call

switch statement (if compiled into a jump table)

Conditional branch, fixed target

if statement

switch statement (if compiled into a series of if/else statements)

Loop condition tests

The && and || operators

The ternary ?: operator

Conditional branch, variable target

Less likely to show up under normal conditions, but the compiler may synthesize one as an optimization, combining two of the above cases. For example, on x86, the compiler may optimize code like if (condition) { obj->VirtualFunctionCall(); } into a conditional indirect jump like jne *%eax if it appears at the end of a function due to tail call optimization.

If I have the following code:

if(something){
    //a
}
else{
    //b
}

(BP = "Branch Prediction" and BTP = "Branch Target Prediction")

Its pretty obvious BP is used to evaluate the conditional something. However I am trying to understand whether BTP is also involved in determine what happens in branch a. Does BTP also happen to determine the address of the code located at branch a/b, depending on the result of the BP?

I ask becase on this wikipedia page (http://en.wikipedia.org/wiki/Branch_target_predictor):

In computer architecture, a branch target predictor is the part of a processor that predicts the target of a taken conditional branch or an unconditional branch instruction before the target of the branch instruction is computed by the execution unit of the processor.

it suggests BTP is used to predict the target after the conditional has been predicted.

1) Could somebody clarify the above please?

A second related question- how do BP and BTP differ in the way they interact with the fetch/decode/execute/write-back pipeline of the CPU? Does BP begin at the fetch or decode stage? After the execution stage of the conditional code we can check whether the prediction was correct and update the branch prediction cache.

2) How does BTP work with regards to the fetch/decode/execute/write-back CPU stages?

Hans Passant Hans Passant · Accepted Answer · 2014-03-23T16:17:11

Do read along with the Intel optimization manual, current download location is here. When stale (they move stuff around all the time) then search the Intel site for "Architectures optimization manual". Keep in mind the info there is fairly generic, they disclose only as much as needed to allow writing efficient code. Branch prediction implementation details are considered a trade secret and do change between architectures. Search the manual for "branch prediction" to find references, it is fairly spread among the chapters.

I'll give a summary of what's found in the manual, adding details where appropriate:

Branch prediction is the job of the BPU unit in the core (Branch Prediction Unit). Roughly correlates to "BP" in your question. It contains several sub-units:

The branch history table. This table keeps track of previously taken conditional branches and is consulted by the predictor to decide if a branch is likely to be taken. Is is fed with entries by the instruction retirement unit, the one that knows whether the branch was actually taken. This is the sub-unit that has changed the most as the architectures improved, getting deeper and smarter as more real estate became available.
The BTB, Branch Target Buffer. This buffer stores the target address of a previously taken indirect jump or call. This correlates to "BTP" in your question. The manual does not state whether the buffer can store multiple targets per address, indexed by the history table, I consider it likely for later architectures.
The Return Stack Buffer. This buffer acts a "shadow" stack, storing the return address for CALL instructions, making the target of a RET instruction available with a high degree of confidence without the processor having to rely on the BTB, unlikely to be as effective for calls. It is documented to be 16 levels deep.

Bullet 2) in a bit difficult to answer accurately, the manual only talks about the "Front End" and does not break down the details of the pipeline. Appropriate enough, it is heavily architecture dependent. The diagram in section 2.2.5 is possibly illustrative. The execution trace cache plays a role, it stores previously decoded instructions so is the primary source of BPU consultations. Otherwise right after the instruction translator (aka decoder).

Branch target prediction in conjunction with branch prediction?

Unconditional branch, fixed target

Unconditional branch, variable target

Conditional branch, fixed target

Conditional branch, variable target

2 Answers