From my university course, I heard, that by convention it is better to place more probable condition in if
rather than in else
, which may help the static branch predictor. For instance:
if (check_collision(player, enemy)) { // very unlikely to be true
doA();
} else {
doB();
}
may be rewritten as:
if (!check_collision(player, enemy)) {
doB();
} else {
doA();
}
I found a blog post Branch Patterns, Using GCC, which explains this phenomenon in more detail:
Forward branches are generated for if statements. The rationale for making them not likely to be taken is that the processor can take advantage of the fact that instructions following the branch instruction may already be placed in the instruction buffer inside the Instruction Unit.
next to it, it says (emphasis mine):
When writing an if-else statement, always make the "then" block more likely to be executed than the else block, so the processor can take advantage of instructions already placed in the instruction fetch buffer.
Ultimately, there is article, written by Intel, Branch and Loop Reorganization to Prevent Mispredicts, which summarizes this with two rules:
Static branch prediction is used when there is no data collected by the microprocessor when it encounters a branch, which is typically the first time a branch is encountered. The rules are simple:
- A forward branch defaults to not taken
- A backward branch defaults to taken
In order to effectively write your code to take advantage of these rules, when writing if-else or switch statements, check the most common cases first and work progressively down to the least common.
As I understand, the idea is that pipelined CPU may follow the instructions from the instruction cache without breaking it by jumping to another address within code segment. I am aware, though, that this may be largly oversimplified in case modern CPU microarchitectures.
However, it looks like GCC doesn't respect these rules. Given the code:
extern void foo();
extern void bar();
int some_func(int n)
{
if (n) {
foo();
}
else {
bar();
}
return 0;
}
it generates (version 6.3.0 with -O3 -mtune=intel
):
some_func:
lea rsp, [rsp-8]
xor eax, eax
test edi, edi
jne .L6 ; here, forward branch if (n) is (conditionally) taken
call bar
xor eax, eax
lea rsp, [rsp+8]
ret
.L6:
call foo
xor eax, eax
lea rsp, [rsp+8]
ret
The only way, that I found to force the desired behavior is by rewriting the if
condition using __builtin_expect
as follows:
if (__builtin_expect(n, 1)) { // force n condition to be treated as true
so the assembly code would become:
some_func:
lea rsp, [rsp-8]
xor eax, eax
test edi, edi
je .L2 ; here, backward branch is (conditionally) taken
call foo
xor eax, eax
lea rsp, [rsp+8]
ret
.L2:
call bar
xor eax, eax
lea rsp, [rsp+8]
ret
__builtin_expect
, like wildplasser mentioned, to tell it which is more likely. Or better yet, profile guided optimization. – Ross Ridge-fdump-tree-all-all
) Normally it has a heuristic that == is more likely false, but it doesn't seem used here. You could file it on gcc's bugzilla to ask about it. Note that if you compile with-fprofile-generate
, then run your program, then recompile with-fprofile-use
, gcc will have access to real statistics and make better decisions. – Marc Glisse