Since you provided a definition of the function in the same translation unit, apparently GCC sees that the function doesn't care about stack alignment and doesn't bother much with it. And apparently this basic inter-procedural analysis / optimization (IPA) is on by default even at -O0
.
Turns out this option even has an obvious name when I searched for "ipa" options in the manual: -fipa-stack-alignment
is on by default even at -O0
. Manually turning it off with -fno-ipa-stack-alignment
results in what you expected, a second sub
whose value depends on the number of pushes (Godbolt), making sure ESP is aligned by 16 before a call like modern Linux versions of the i386 SysV ABI use.
Or if you change the definition to just a declaration, then the resulting asm is as expected, fully respecting -mpreferred-stack-boundary
.
void callee(void* a, void* b) {
}
to
void callee(void* a, void* b);
Using -fPIC
also forces GCC to not assume anything about the callee, so it does respect the possibility of function interposition (e.g. via LD_PRELOAD) with the appropriate option.
Without compiling for a shared library, GCC is allowed to assume that any definition it sees for a global function is the definition, thanks to ISO C's one-definition-rule.
If you use __attribute__((noipa))
on the function definition, then call sites won't assume anything based on the definition. Just like if you'd renamed the definition (so you could still look at it) and provided only a declaration of the name the caller uses.
If you just want to stop inlining, you can use __attribute__((noinline,noclone))
instead, to still allow the callsite to be like it would if the optimizer simply chose not to inline, but could still see this definition. That may or may not be what you want.
See also How to remove "noise" from GCC/clang assembly output? re: writing functions whose asm is interesting to look at, and compiler options.
And BTW, I found it easiest to change the declaration / definition to variadic, so I could add or remove args with only a change to the caller. I was still able to reproduce your result of that not changing the sub
amount even when the push
amount changes with an extra arg, when there's a definition, but not with just a declaration.
void callee(void* a, ...) // {} // comment out a body or not
;