5
votes

How "sticky" is the branch predictor logic? If code is being removed from the instruction caches, do the statistics stay with it?

Put another way, if the code is complex or not processing things in batch, is branch prediction still going to help?

Let's assume commodity Intel server hardware newer than 2011.

1
It's probably going to vary from processor to processor as Intel tweaks its algorithms/hardware, maybe even from stepping to stepping. I'm also pretty sure that Intel wouldn't reveal the specifics behind its branch predictor, as branch predictor performance is a huge part of overall processor performance and I'd imagine that'd be a closely guarded secret.awksp
The instruction cache shouldn't have anything to do with it. There's a "cache" dedicated to storing branches and their histories. So it can track (thousands?) of different branches. There probably won't be any issues unless you overrun that one.Mysticial
Please don't tag questions with irrelevant tags. This Question is about Intel processor internals. It has nothing to do with Java.Stephen C
@Mysticial is that an answer?Michael Deardeuff

1 Answers

8
votes

The exact workings of branch predictors will vary between processors. But nearly all non-trivial branch predictors need a history of the branches in the program to function.

This history is recorded in the branch history buffer.

These come in multiple flavors. The two most commonly studied are:

  • Local History - which tracks the history of each individual branch.
  • Global History - which tracks the combined history of all the branches.

Modern processors will have multiple buffers for different purposes. In all cases, the buffers have a limited size. So when they run out of room, something will need to be evicted.

Neither Intel nor AMD gives details about their branch predictors. But it is believed that current processors from both companies can track thousands of branches along with their histories.


Getting back to the point, the data that is used by the branch predictors will "stick" for as long as it stays in the history buffers. So the performance of the predictors is best if the code is small and well-behaved enough to not overrun the buffers.

  • If most of the computation is spent in a small amount of code, the local history buffers will be able to track all the branches that are commonly hit.
  • If the computation is all over the place, there may be too many branches for the branch predictor to track and thus its performance will degrade.

Note that the instruction and uop caches, while independent of the branch predictor, will exhibit the same effects. So it may be difficult to single out the branch predictor when attempting to construct test cases and benchmarks to study its behavior.

So this is yet another case in performance where having locality has advantages.