Why is branch prediction quite accurate?

Question

Why is branch prediction accurate? Can we generally think of it at a high level in terms of how certain branches of our code execute 99% of time, while the rest is special cases and exception handling?

My question my be a little vague but I am only interested in high level view on this. Let me give you an example

Say you have a function with a parameter

void execute(Input param) { 
  assertNotEmpty(param)
  (...)
}

I execute my function conditionally given parameter isn't empty. 99% of times this parameter will indeed be non empty. Can I then think of neural network based branch prediction for example, in a way, that as it has seen such instruction flow countless times (such assertions are quite common), it will simply learn that most of the time that parameter is non empty and take branch accordingly?

Can we then think of our code in terms of - the cleaner, the more predictable it is, or even more common - the easier we make it for branch predictor?

Thanks!

dbajgoric dbajgoric · Accepted Answer · 2017-08-14T12:55:02

There are couple of reasons that allow us to develop good branch predictors:

Bi-modal distribution - the outcome of branches is often bimodally distributed, i.e. an individual branch is often highly biased towards taken or untaken. If the distribution of most branches would be uniform then it'd be impossible to devise a good prediction algorithm.
Dependency between branches - in real-world programs, there is a significant amount of dependency between distinct branches, that is the outcome of one branch affects the outcome of another branch. For example:
```
if (var1 == 3)     // b1
    var1 = 0;
if (var2 == 3)     // b2
    var2 = 0;
if (var1 != var2)  // b3
    ...
```
The outcome of branch b3 here depends on the outcome of branches b1 and b2. If both b1 and b2 are untaken (that is their conditions evaluate to true and var1 and var2 are assigned 0) then branch b3 will be taken. The predictor that looks at a single branch only has no way to capture this behavior. Algorithms that examine this inter-branch behavior are called two-level predictors.

You didn't ask for any particular algorithms so I won't describe any of them, but I'll mention the 2-bit prediction buffer scheme that works reasonably well and is quite simple to implement (essentially, one keeps track of outcomes of a particular branch in a cache and makes decision based on the current state in the cache). This scheme was implemented in the MIPS R10000 processor and the results showed prediction accuracy of ~90%.

I'm not sure about application of NNs to branch-prediction - it does seem possible to design an algorithm based on NNs. However, I believe it wouldn't have any practical usage as: a) it would be too complex to implement in hardware (so it'd take too many gates and introduce a lot of delay); b) it wouldn't have significant improvement on predictor's performance compared to traditional algorithms that are much easier to implement.

Why is branch prediction quite accurate?

3 Answers