33
votes

From my university course, I heard, that by convention it is better to place more probable condition in if rather than in else, which may help the static branch predictor. For instance:

if (check_collision(player, enemy)) { // very unlikely to be true
    doA();
} else {
    doB();
}

may be rewritten as:

if (!check_collision(player, enemy)) {
    doB();
} else {
    doA();
}

I found a blog post Branch Patterns, Using GCC, which explains this phenomenon in more detail:

Forward branches are generated for if statements. The rationale for making them not likely to be taken is that the processor can take advantage of the fact that instructions following the branch instruction may already be placed in the instruction buffer inside the Instruction Unit.

next to it, it says (emphasis mine):

When writing an if-else statement, always make the "then" block more likely to be executed than the else block, so the processor can take advantage of instructions already placed in the instruction fetch buffer.

Ultimately, there is article, written by Intel, Branch and Loop Reorganization to Prevent Mispredicts, which summarizes this with two rules:

Static branch prediction is used when there is no data collected by the microprocessor when it encounters a branch, which is typically the first time a branch is encountered. The rules are simple:

  • A forward branch defaults to not taken
  • A backward branch defaults to taken

In order to effectively write your code to take advantage of these rules, when writing if-else or switch statements, check the most common cases first and work progressively down to the least common.

As I understand, the idea is that pipelined CPU may follow the instructions from the instruction cache without breaking it by jumping to another address within code segment. I am aware, though, that this may be largly oversimplified in case modern CPU microarchitectures.

However, it looks like GCC doesn't respect these rules. Given the code:

extern void foo();
extern void bar();

int some_func(int n)
{
    if (n) {
        foo();
    }
    else {
        bar();
    }
    return 0;
}

it generates (version 6.3.0 with -O3 -mtune=intel):

some_func:
        lea     rsp, [rsp-8]
        xor     eax, eax
        test    edi, edi
        jne     .L6            ; here, forward branch if (n) is (conditionally) taken
        call    bar
        xor     eax, eax
        lea     rsp, [rsp+8]
        ret
.L6:
        call    foo
        xor     eax, eax
        lea     rsp, [rsp+8]
        ret

The only way, that I found to force the desired behavior is by rewriting the if condition using __builtin_expect as follows:

if (__builtin_expect(n, 1)) { // force n condition to be treated as true

so the assembly code would become:

some_func:
        lea     rsp, [rsp-8]
        xor     eax, eax
        test    edi, edi
        je      .L2             ; here, backward branch is (conditionally) taken
        call    foo
        xor     eax, eax
        lea     rsp, [rsp+8]
        ret
.L2:
        call    bar
        xor     eax, eax
        lea     rsp, [rsp+8]
        ret
2
stackoverflow.com/q/109710/905902 The linux kernel uses macros (alling the __builtin_expect) to use the a priory knowledge about the conditional branches.wildplasser
Modern Intel CPUs don't use static branch prediction. I also don't think GCC promises anywhere to consider the "true" clause of an if/else statement to be the most likely alternative. You're supposed to use __builtin_expect, like wildplasser mentioned, to tell it which is more likely. Or better yet, profile guided optimization.Ross Ridge
See Anger Fog's microarchitecture manual. Section 3.16 "Static Prediction in PM and Core 2": "These processors do not use static prediction. The predictor simply makes a random prediction the first time a branch is seen, depending on what happens to be in the BTB entry that is assigned to the new branch.". agner.org/optimizeRoss Ridge
Even in a full scale program its unlikely to matter. Unless you're using a processor with only static prediction most jumps are going to be dynamically predicted.Ross Ridge
For some reason, gcc's profile_estimate pass guesses that n has 54% chances of being 0... (see -fdump-tree-all-all) Normally it has a heuristic that == is more likely false, but it doesn't seem used here. You could file it on gcc's bugzilla to ask about it. Note that if you compile with -fprofile-generate, then run your program, then recompile with -fprofile-use, gcc will have access to real statistics and make better decisions.Marc Glisse

2 Answers

3
votes

The short answer: no, it is not.

GCC does metrics ton of non trivial optimization and one of them is guessing branch probabilities judging by control flow graph.

According to GCC manual:

fno-guess-branch-probability

Do not guess branch probabilities using heuristics.

GCC uses heuristics to guess branch probabilities if they are not provided by profiling feedback (-fprofile-arcs). These heuristics are based on the control flow graph. If some branch probabilities are specified by __builtin_expect, then the heuristics are used to guess branch probabilities for the rest of the control flow graph, taking the __builtin_expect info into account. The interactions between the heuristics and __builtin_expect can be complex, and in some cases, it may be useful to disable the heuristics so that the effects of __builtin_expect are easier to understand.

-freorder-blocks may swap branches as well.

Also, as OP mentioned the behavior might be overridden with __builtin_expect.

Proof

Look at the following listing.

void doA() { printf("A\n"); }
void doB() { printf("B\n"); }
int check_collision(void* a, void* b)
{ return a == b; }

void some_func (void* player, void* enemy) {
    if (check_collision(player, enemy)) {
        doA();
    } else {
        doB();
    }
}

int main() {
    // warming up gcc statistic
    some_func((void*)0x1, NULL);
    some_func((void*)0x2, NULL);
    some_func((void*)0x3, NULL);
    some_func((void*)0x4, NULL);
    some_func((void*)0x5, NULL);
    some_func(NULL, NULL);
    return 0;
}

It is obvious that check_collision will return 0 most of the times. So, the doB() branch is likely and GCC can guess this:

gcc -O main.c -o opt.a
objdump -d opt.a

The asm of some_func is:

sub    $0x8,%rsp
cmp    %rsi,%rdi
je     6c6 <some_func+0x18>
mov    $0x0,%eax
callq  68f <doB>
add    $0x8,%rsp
retq   
mov    $0x0,%eax
callq  67a <doA>
jmp    6c1 <some_func+0x13>

But for sure, we can enforce GCC from being too smart:

gcc -fno-guess-branch-probability main.c -o non-opt.a
objdump -d non-opt.a

And we will get:

push   %rbp
mov    %rsp,%rbp
sub    $0x10,%rsp
mov    %rdi,-0x8(%rbp)
mov    %rsi,-0x10(%rbp)
mov    -0x10(%rbp),%rdx
mov    -0x8(%rbp),%rax
mov    %rdx,%rsi
mov    %rax,%rdi
callq  6a0 <check_collision>
test   %eax,%eax
je     6ef <some_func+0x33>
mov    $0x0,%eax
callq  67a <doA>
jmp    6f9 <some_func+0x3d>
mov    $0x0,%eax
callq  68d <doB>
nop
leaveq 
retq  

So GCC will leave branches in source order.

I used gcc 7.1.1 for those tests.

-1
votes

I Think That You've Found A "Bug"

The funny thing is that optimization for space and no optimization are the only cases in which the "optimal" instruction code is generated: gcc -S [-O0 | -Os] source.c

some_func:
FB0:
       pushl   %ebp
       movl    %esp, %ebp
       subl    $8, %esp
       cmpl    $0, 8(%ebp)
       je      L2
       call    _foo
       jmp     L3
2:
       call    _bar
3:
       movl    $0, %eax
       # Or, for -Os:
       # xorl    %eax, %eax
       leave
       ret

My point is that ...


some_func:
FB0:
       pushl   %ebp
       movl    %esp, %ebp
       subl    $8, %esp
       cmpl    $0, 8(%ebp)
       je      L2
       call    _foo

... up to & through the call to foo everything is "optimal", in the traditional sense, regardless of the exit strategy.

Optimality is ultimately determined by the processor, of course.