4
votes

When I'm programming on a normal day, I make sure that all branches are most likely not taken.

int retval = do_somting();
if(!retval) { /* Less-than-likely event*/ }

This optimists branch predictions, causing the CPU's predictor bit(s) set to "do not take". However do the predictor bit(s) get forced back into "take" after a for loop?

// prediction = "likely take"
if(false) { }

// prediction = "probably take"
if(false) { }

// prediction = "probably not take"
if(false) { }

// prediction = "likely not take"
if(false) { }

/* ... thousands of other if(false) that are speedy-fast */

for(int i = 0; i < 5; i++) { }
// prediction = "likely take"?

I know it's an unrealistic and minuscule optimization, but hey, the more you know.

EDIT: Let's assume GCC does not trash all of this code above, and let's also only talk about amd64 architecture. As I did not realize how low-level this question is.

2
I'm not sure if it will even end up being an optimisation, due to compiler and CPU optimisation.AntonH
Don't waste your time. The more you learn, the less you know :-)user1023602
Is your question "does a for loop use branches"? If so, the answer is yes.Kevin
Also branch predictors usually have many sets of bits. Taking or not taking a branch won't affect the prediction for all other branches. A simple predictor can look at a set of bits based on the location of the branch in the code (e.g. program counter mod # of entries in the predictor).Kevin
@SanchkeDellowar: I assume it is more a silicon than a silicone question. The latter are better suited at Playboy or Penthouse ;-)too honest for this site

2 Answers

3
votes

As it turns out branch prediction is depended on model of the CPU.

According to this paper, branch prediction is handled in countless amount of ways when relating loops to normal branches. Some CPUs have a separate predictor loops. So that means if statements do not at all effect the prediction of a for statement. Others they share the same prediction.

Regardless, there is not one true answer to this question. For loops are not to be measured when talking about branch efficiency.

...Unless of course you plan to run your program on only a single model of CPU.

1
votes

Most architectures with branch prediction (including AMD64) consider short downward/forward jumps/branches unlikely and short upward/backward branches/jumps likely. This means that most loops are predicted to continue looping. This makes a do-while loop fractionally more efficient than a for loop or while loop because of the initial conditional; however most optimizing compilers will optimize these cases to similar code where possible.

You can see the differences in assembly with gcc at -O3 optimization level by using a conditional with __builtin_expect(). The unlikely branch will typically be a forward jump while the likely condition(s) will either not branch at all or jump backward. This may involve inverting the logic. Note: at -O3, gcc will often duplicate code into the unlikely branch so that the branches in the likely cases can be minimized.

This makes sense because a loop that fits in a cache line will not have a cache miss if it branches to its beginning. Similarly, since the programs generally progress linearly forward within a function, it is also likely that recently executed code will already be in cache. When you replace a loop with a bunch of extra "optimized" conditionals, at some point (probably around 4 conditionals) the cache misses will override any miniscule benefits you may get at the cost of readability and maintainability.