10
votes

I'm writing some audio code where basically everything is a tiny loop. Branch prediction failures as I understand them is a big enough performance issue that I struggle to keep the code branch free. But there is only so far that can take me, which got me wondering about the different kinds of branching.

In c++, the conditional branch to fixed target:

int cond_fixed(bool p) {
    if (p) return 10;
    return 20;
}

And (if I understand this question correctly), the unconditional branch to variable target:

struct base {
    virtual int foo() = 0;
};

struct a : public base {
    int foo() { return 10; }
};

struct b : public base {
    int foo() { return 20; }
};

int uncond_var(base* p) {
    return p->foo();
}

Are there performance differences? It seems to me that if one of the two methods were obviously faster than the other, the compiler would simply transform the code to match.

For those cases where branch prediction is of very high importance, what details regarding performance are useful to know?

EDIT: The actual operation of x : 10 ? 20 is merely a place holder. The actual operation following the branch is at least complex enough that doing both is inefficient. Additionally, if I had enough information to sensibly use __builtin_expect, branch prediction would be a non-issue in this case.

2
Which CPU ? Which compiler ? Did you check out the assembly to know which of the two strategies was picked ?Matthieu M.
Note: the compiler cannot transform the latter uncond_var because it does not know the complete set of possible derived classes of base. In general closed problems (finite number of possible inputs) are easier to solve than open ones.Matthieu M.
@MatthieuM. Compiler GCC, CPU anything from desktop to smartphones, though a modern desktop CPU is my current concern. Also, it seems odd to me that the compiler doesn't know all possible derived classes of base. It has all the source code, so this information exists. And no, I am not familiar enough with assembly to feel productive diving into such details. That's why I turn to this site, to hopefully get a higher level understanding from someone that knows such details.porgarmingduod
Regarding the CPU: some CPUs may not have predictors (or maybe only one kind); so the question is not meaningful for all CPUs. Desktop CPUs (x86/x86_64) should have both. Regarding the knowledge available to the compiler: in theory it could, in practice this information is only available if you look at the whole program at once. The compiler front-end (language aware) will not look at the whole program at once, and the optimizer (middle of the chain) might if you specify LTO (Link Time Optimization) or compile a static executable... but knows nothing about classes. Consider it will not happen.Matthieu M.

2 Answers

4
votes

Side note: if you have a code like

if (p) a = 20; else a = 10;

then there isn't any branch. The compiler is using a conditional move (see: Why is a conditional move not vulnerable for Branch Prediction Failure?)

1
votes

You didn't mention your compiler. I once used GCC for a performance critical application (a contest at my university actually) and I remember that GCC has the __builtin_expect macro. I went through all the conditions in my code and ended up with 5-10% speedup, which I found to be amazing, given the fact that I payed attention to pretty much everything I knew (memory-layout etc.) and that I didn't change anything regarding the algorithm itself.

The algorithm was a pretty basic depth-search by the way. And I ran it on a Core 2 Duo, not sure which ones though.