Optimizing branch predictions: how to generalize code that could run wth different compiler, interperter, and hardware prediction?

Question

I ran into some slow downs on a tight loop today caused by an If statement, which surprised me some because I expected branch prediction to successfully pipeline the particular statement to minimize the cost of the conditional.

When I sat down to think more about why it wasn't better handled I realized I didn't know much about how branch prediction was being handled at all. I know the concept of branch prediction quite well and it's benefits, but the problem is that I didn't know who was implementing it and what approach they were utilizing for predicting the outcome of a conditional.

Looking deeper I know branch prediction can be done at a few levels:

Hardware itself with instruction pipelining
C++ style compiler
Interpreter of interpreted language.
half-compiled language like java may do two and three above.

However, because optimization can be done in many areas I'm left uncertain as to how to anticipate branch prediction. If I'm writing in Java, for example, is my conditional optimized when compiled, when interpreted, or by the hardware after interpretation!? More interesting, does this mean if someone uses a different runtime enviroment? Could a different branch prediction algorithm used in a different interpreter result in a tight loop based around a conditional showing significant different performance depending on which interpreter it's run with?

Thus my question, how does one generalize an optimization around branch prediction if the software could be run on very different computers which may mean different branch prediction? If the hardware and interpreter could change their approach then profiling and using whichever approach proved fastest isn't a guarantee. Lets ignore C++ where you have compile level ability to force this, looking at the interpreted languages if someone still needed to optimize a tight loop within them.

Are there certain presumptions that are generally safe to make regardless of interpreter used? Does one have to dive into the intricate specification of a language to make any meaningful presumption about branch prediction?

This is a bit broad. Very generally, analyze and get an average of the hardware it'll run on and try to optimize from that. — edmz
I wouldn't target an interpreted language, as these have "hidden overhead" that may involve branches on which you have no control. — Yves Daoust
About the only "portable" measure you can take is to avoid conditional branches when you can. See stackoverflow.com/a/17828251/1196549 — Yves Daoust

Gabriel Southern Gabriel Southern · Accepted Answer · 2016-02-01T23:49:04

Short answer:

To help improve the performance of the branch predictor try to structure your program so that conditional statements don't depend on apparently random data.

Details

One of the other answers to this question claims:

There is no way to do anything at the high level language to optimize for branch prediction, caching sure, sometimes you can, but branch prediction, no not at all.

However, this is simply not true. A good illustration of this fact comes from one of the most famous questions on Stack Overflow.

All branch predictors work by identifying patterns of repeated code execution and using this information to predict the outcome and/or target of branches as necessary.

When writing code in a high-level language it's typically not necessary for an application programmer to worry about trying to optimizing conditional branches. For instance gcc has the __builtin_expect function which allows the programmer to specify the expected outcome of a conditional branch. But even if an application programmer is certain they know the typical outcome of a specific branch it's usually not necessary to use the annotation. In a hot loop using this directive is unlikely to help improve performance. If the branch really is strongly biased the the predictor will be able to correctly predict the outcome most of the time even without the programmer annotation.

On most modern processors branch predictors perform incredibly well (better than 95% accurate even on complex workloads). So as a micro-optimization, trying to improve branch prediction accuracy is probably not something that an application programmer would want to focus on. Typically the compiler is going to do a better job of generating optimal code that works for the specific hardware platform it is targeting.

But branch predictors rely on identifying patterns, and if an application is written in such a way that patterns don't exist, then the branch predictor will perform poorly. If the application can be modified so that there is a pattern then the branch predictor has a chance to do better. And that is something you might be able to consider at the level of a high-level language, if you find a situation where a branch really is being poorly predicted.

Optimizing branch predictions: how to generalize code that could run wth different compiler, interperter, and hardware prediction?

2 Answers

Short answer:

Details