2
votes

So I understand the basic techniques that are used in branch prediction for pipelined processors - stuff like 2-bit saturated counters, two level adaptive predictors, etc.

Here are my questions:

1) Branch target prediction: why is this important and what are some of the mechanisms used here? When I think of a branch I think "bne r2, r3, LABEL" which says that if r2 != r3 then branch to LABEL which means do PC (program counter) = PC + LABEL. What's so mysterious about predicting the target here? You know what it's going to be based on the compiled value of LABEL. I'm probably missing the point here somehow.

2) Why is the program counter value itself (e.g. 0x4001000C), or at least its last few bits, used as part of the branch prediction scheme? I saw a scheme where the last 4 bits of the PC were concatenated to the (4-bit) branch history register and that 8-bit value was used to access the pattern history table. I would think the PC is pretty arbitrary!

Thank you for any help understanding these issues

2

2 Answers

6
votes

What's so mysterious about predicting the target here? You know what it's going to be based on the compiled value of LABEL.

Because of the CPU pipeline depth and the cache latency, it will take many cycles between fetching an instruction, fully decoding it to identify the branch target, and being able to fetch that instruction. So you predict the target in order to pre-emptively fetch the next instruction.

Why is the program counter value itself (e.g. 0x4001000C), or at least its last few bits, used as part of the branch prediction scheme?

Becasue the PC uniquely identifies a particular branch instruction! How else are you going to index the branch-prediction table?

4
votes

To add a bit more color

1) Branch Target Prediction isn't so much for the example you gave. It is much more important for virtual functions and things like computed branches (for switch statements and the like). In both cases, the branch target is unknown at compile time. In the virtual function case, it must be loaded from a memory location (the virtual function table) and in the switch statement case, it us usually done by looking up the branch offset in a table. Even though the virtual function case is unconditional, it still uses the BTB heavily.

2) There are basically two general classes of predictors, those that are global and those that are per-address. Global predictors have the benefit of being able to work with much longer pattern histories (history of taken/not-taken). Per Address predictors have the benefit of being able to be specific to certain branch locations, but generally have shorter histories (because they're per address and thus take up lots of space). Many processors use what is often called a tournament branch predictor where there is both a global predictor and a per-address predictor and then there is a predictor (confidence estimator) to choose the predictor that is expected to give a better answer.