4
votes

This question is mostly a follow up after reading this article by Aater Suleman on improving branch prediction from the software side. The author provides a way to "unroll" conditional statements such to increase the probability of predicting a branch taken in the case of a 2-bit saturated counter scheme. Here's an exerpt:

Let me explain with an example. Lets suppose that X is random variable between 0 and 99. I want to run the following code:

if (X > 5 && X < 95) //branch is taken 90% of the time
do_something();

However, if I write the code as:

if(X > 5) //branch is taken 95% of the time
if(X < 95) //branch is taken 95% of the time
do_something();

The branch predictor has a better shot at predicting both these branches more accurately which may lead to better performance because the counters assigned to both these branches are more likely to stay saturated at taken (because two not-taken are less likely).

In general, any time you are ANDing/ORing conditions in an if-statement, you should think if the combination will be more biased or less biased and pick the version which is more biased.

My question is: do compilers follow this heuristic all of the time? Would a compiler even have the jurisdiction to do something like this since compilers exist in the scope of ISA's and architectures and branch prediction schemes exist in the scope of processors and more specific hardware implementations?

My intuition is that it would not hurt performance to expand control statements in such a way but at the same time I have not been able to find any evidence that compilers make such optimizations. If that is the case, why don't they? What am I missing in my reasoning and can someone please provide an example where such an optimization would be detrimental with respect to a certain architecture or prediction scheme?

Thanks.

2
Just leave this sort of thing to your compiler - whose writers are in constant contact with the CPU architects on which they compile, and who are aware of many issues that you are not likely to be. Focus on writing architecturally sound code and you'll get far more benefit than worrying about these niggling details that are likely to cause you to write crappy code. Seriously - focus on something useful.Mordachai

2 Answers

4
votes

It seems that Suleman is not aware that

if (X > 5 && X < 95)
  do_something();

and

if(X > 5)
  if(X < 95)
    do_something();

are semantically equivalent in C and C++. The latest C standard states (6.5.13/4):

Unlike the bitwise binary & operator, the && operator guarantees left-to-right evaluation; if the second operand is evaluated, there is a sequence point between the evaluations of the first and second operands. If the first operand compares equal to 0, the second operand is not evaluated.

A compiler may or may not generate the same code for both code samples. Rewriting your program to use one or the other form to improve performance is not a good idea. Results would depend heavily on the compiler version, ISA, and probably more variables. Leave this type of optimization to the compiler, but give the compiler the information it needs.

Some compilers like GCC and LLVM allow you to give explicit hints like this:

if (__builtin_expect(X > 5, 1)) {
  // This block is likely to be taken.
}
if (__builtin_expect(X <= 5, 0)) {
  // This block is unlikely to be taken.
}

Another way is to use profile guided optimization. The first step requires one or more test runs of your program to generate a database with statistical information about branch instructions. In a second step the compiler can then use this database to optimize your program. See your compiler manual for more details.

1
votes

You can avoid the two branches by replacing this code with

if (((unsigned int) X) - 6 < 88)
    do_something ();

The __builtin_expect obviously cannot change whether a condition is true or false. It's mostly used for processors where a taken branch is slower than a branch that is not taken. So the compiler would compile

if (__builtin_expect (condition, 0))
    statement;

morestuff ();

to this:

if (condition) goto elsewhere;
backhere: morestuff ();

....

elsewhere: statement; goto backhere;

so that usually no branch is being taken, instead of the usual

if (! condition) goto nextstatement;
statement;
nextstatement: morestuff ();

But in the question, things don't work that way. if (cond1 && cond2) will not generate a single branch (only in special cases with a very clever compiler), so the comparison isn't valid. And if it was, two branches with 95% correct prediction are not better than one branch with 90%, because the number of mispredictions will be the same.