__builtin_expect
introduces a true test-and-branch instruction in the code which needs to be evaluated by the CPU, while the instructions from the more likelier path-to-be-followed continue to be pre-fetched into the CPU pipeline.
static_key_*
introduces a NOP
instruction that occupies sufficient space in the code that can be later patched during run-time to add a jmp <label>
. This can be used to accommodate debugging prints with zero-impact on the regular working case i.e while the code is not being debugged.
From the Linux kernel documentation for static-keys
,
gcc (v4.5) adds a new 'asm goto' statement that allows branching to a
label:
gcc.gnu.org/ml/gcc-patches/2009-07/msg01556.html
Using the 'asm goto', we can create branches that are either taken or
not taken by default, without the need to check memory. Then, at
run-time, we can patch the branch site to change the branch direction.
For example, if we have a simple branch that is disabled by default:
if (static_key_false(&key))
printk("I am the true branch\n");
Thus, by default the 'printk' will not be emitted. And the code
generated will consist of a single atomic 'no-op' instruction (5 bytes
on x86), in the straight-line code path. When the branch is 'flipped',
we will patch the 'no-op' in the straight-line code-path with a 'jump'
instruction to the out-of-line true branch. Thus, changing branch
direction is expensive but branch selection is basically 'free'. That
is the basic tradeoff of this optimization.
This low-level patching mechanism is called 'jump label patching`, the basis of static-keys.