Expanding on PolyThinker's answer, here's a concrete example.
int foo(int a, int b) {
if (a && b)
return foo(a - 1, b - 1);
return a + b;
}
i686-pc-linux-gnu-gcc-4.3.2 -Os -fno-optimize-sibling-calls
output:
00000000 <foo>:
0: 55 push %ebp
1: 89 e5 mov %esp,%ebp
3: 8b 55 08 mov 0x8(%ebp),%edx
6: 8b 45 0c mov 0xc(%ebp),%eax
9: 85 d2 test %edx,%edx
b: 74 16 je 23 <foo+0x23>
d: 85 c0 test %eax,%eax
f: 74 12 je 23 <foo+0x23>
11: 51 push %ecx
12: 48 dec %eax
13: 51 push %ecx
14: 50 push %eax
15: 8d 42 ff lea -0x1(%edx),%eax
18: 50 push %eax
19: e8 fc ff ff ff call 1a <foo+0x1a>
1e: 83 c4 10 add $0x10,%esp
21: eb 02 jmp 25 <foo+0x25>
23: 01 d0 add %edx,%eax
25: c9 leave
26: c3 ret
i686-pc-linux-gnu-gcc-4.3.2 -Os
output:
00000000 <foo>:
0: 55 push %ebp
1: 89 e5 mov %esp,%ebp
3: 8b 55 08 mov 0x8(%ebp),%edx
6: 8b 45 0c mov 0xc(%ebp),%eax
9: 85 d2 test %edx,%edx
b: 74 08 je 15 <foo+0x15>
d: 85 c0 test %eax,%eax
f: 74 04 je 15 <foo+0x15>
11: 48 dec %eax
12: 4a dec %edx
13: eb f4 jmp 9 <foo+0x9>
15: 5d pop %ebp
16: 01 d0 add %edx,%eax
18: c3 ret
In the first case, <foo+0x11>-<foo+0x1d>
pushes arguments for a function call, while in the second case, <foo+0x11>-<foo+0x14>
modifies the variables and jmp
s to the same function, somewhere after the preamble. That's what you want to look for.
I don't think you can do this programatically; there's too much possible variation. The "meat" of the function may be closer to or further away from the start, and you can't distinguish that jmp
from a loop or conditional without looking at it. It might be a conditional jump instead of a jmp
. gcc
might leave a call
in for some cases but apply sibling call optimization to other cases.
FYI, gcc's "sibling calls" is slightly more general than tail-recursive calls -- effectively, any function call where re-using the same stack frame is okay is potentially a sibling call.
[edit]
As an example of when just looking for a self-recursive call
will mislead you,
int bar(int n) {
if (n == 0)
return bar(bar(1));
if (n % 2)
return n;
return bar(n / 2);
}
GCC will apply sibling call optimization to two out of the three bar
calls. I'd still call it tail-call-optimized, since that single unoptimized call never goes further than a single level, even though you'll find a call <bar+..>
in the generated assembly.