TL:DR: 3 options:
- Build a non-PIE executable (
gcc -no-pie -fno-pie call-lib.c libcall.o
) so the linker will generate a PLT entry for you transparently when you write call puts
.
call puts wrt ..plt
like gcc -fPIE
would do.
call [rel puts wrt ..got]
like gcc -fno-plt
would do.
The latter two will work in PIE executables or shared libraries. The 3rd way, wrt ..got
, is slightly more efficient.
Your gcc is building PIE executables by default (32-bit absolute addresses no longer allowed in x86-64 Linux?).
I'm not sure why, but when doing so the linker doesn't automatically resolve call puts
to call puts@plt
. There is still a puts
PLT entry generated, but the call
doesn't go there.
At runtime, the dynamic linker tries to resolve puts
directly to the libc symbol of that name and fixup the call rel32
. But the symbol is more than +-2^31 away, so we get a warning about overflow of the R_X86_64_PC32
relocation. The low 32 bits of the target address are correct, but the upper bits aren't. (Thus your call
jumps to a bad address).
Your code works for me if I build with gcc -no-pie -fno-pie call-lib.c libcall.o
. The -no-pie
is the critical part: it's the linker option. Your YASM command doesn't have to change.
When making a traditional position-dependent executable, the linker turns the puts
symbol for the call target into puts@plt
for you, because we're linking a dynamic executable (instead of statically linking libc with gcc -static -fno-pie
, in which case the call
could go directly to the libc function.)
Anyway, this is why gcc emits call puts@plt
(GAS syntax) when compiling with -fpie
(the default on your desktop, but not the default on https://godbolt.org/), but just call puts
when compiling with -fno-pie
.
See What does @plt mean here? for more about the PLT, and also Sorry state of dynamic libraries on Linux from a few years ago. (The modern gcc -fno-plt
is like one of the ideas in that blog post.)
BTW, a more accurate/specific prototype would let gcc avoid zeroing EAX before calling foo
:
extern void foo();
in C means extern void foo(...);
You could declare it as extern void foo(void);
, which is what ()
means in C++. C++ doesn't allow function declarations that leave the args unspecified.
asm improvements
You can also put message
in section .rodata
(read-only data, linked as part of the text segment).
You don't need a stack frame, just something to align the stack by 16 before a call. A dummy push rax
will do it.
Or we can tail-call puts
by jumping to it instead of calling it, with the same stack position as on entry to this function. This works with or without PIE. Just replace call
with jmp
, as long as RSP is pointing at your own return address.
If you want to make PIE executables (or shared libraries), you have two options
call puts wrt ..plt
- explicitly call through the PLT.
call [rel puts wrt ..got]
- explicitly do an indirect call through the GOT entry, like gcc's -fno-plt
style of code-gen. (Using a RIP-relative addressing mode to reach the GOT, hence the rel
keyword).
WRT = With Respect To. The NASM manual documents wrt ..plt
, and see also section 7.9.3: special symbols and WRT.
Normally you would use default rel
at the top of your file so you can actually use call [puts wrt ..got]
and still get a RIP-relative addressing mode. You can't use a 32-bit absolute addressing mode in PIE or PIC code.
call [puts wrt ..got]
assembles to a memory-indirect call using the function pointer that dynamic linking stored in the GOT. (Early-binding, not lazy dynamic linking.)
NASM documents ..got
for getting the address of variables in section 9.2.3. Functions in (other) libraries are identical: you get a pointer from the GOT instead of calling directly, because the offset isn't a link-time constant and might not fit in 32-bits.
YASM also accepts call [puts wrt ..GOTPCREL]
, like AT&T syntax call *puts@GOTPCREL(%rip)
, but NASM does not.
; don't use BITS 64. You *want* an error if you try to assemble this into a 32-bit .o
default rel ; RIP-relative addressing instead of 32-bit absolute by default; makes the [rel ...] optional
section .rodata ; .rodata is best for constants, not .data
message:
db 'foo() called', 0
section .text
global foo
foo:
sub rsp, 8 ; align the stack by 16
; PIE with PLT
lea rdi, [rel message] ; needed for PIE
call puts WRT ..plt ; tailcall puts
;or
; PIE with -fno-plt style code, skips the PLT indirection
lea rdi, [rel message]
call [rel puts wrt ..got]
;or
; non-PIE
mov edi, message ; more efficient, but only works in non-PIE / non-PIC
call puts ; linker will rewrite it into call puts@plt
add rsp,8 ; restore the stack, undoing the add
ret
In a position-dependent Linux executable, you can use mov edi, message
instead of a RIP-relative LEA. It's smaller code-size and can run on more execution ports on most CPUs. (Fun fact: MacOS always puts the "image base" outside the low 4GiB so this optimization isn't possible there.)
In a non-PIE executable, you also might as well use call puts
or jmp puts
and let the linker sort it out, unless you want more efficient no-plt style dynamic linking. But if you do choose to statically link libc, I think this is the only way you'll get a direct jmp to the libc function.
(I think the possibility of static linking for non-PIE is why ld
is willing to generate PLT stubs automatically for non-PIE, but not for PIE or shared libraries. It requires you to say what you mean when linking ELF shared objects.)
If you did use call puts
in a PIE (call rel32
), it could only work if you statically linked a position-independent implementation of puts
into your PIE, so the entire thing was one executable that would get loaded at a random address at runtime (by the usual dynamic-linker mechanism), but simply didn't have a dependency on libc.so.6
Linker "relaxing" calls when the target is present at static-link time
GAS call *bar@GOTPCREL(%rip)
uses R_X86_64_GOTPCRELX
(relaxable)
NASM call [rel bar wrt ..got]
uses R_X86_64_GOTPCREL
(not relaxable)
This is less of a problem with hand-written asm; you can just use call bar
when you know the symbol will be present in another .o
(rather than .so
) that you're going to link. But C compilers don't know the difference between library functions and other user functions you declare with prototypes (unless you use stuff like gcc -fvisibility=hidden
https://gcc.gnu.org/wiki/Visibility or attributes / pragmas).
Still, you might want to write asm source that the linker can optimize if you statically link a library, but AFAIK you can't do that with NASM. You can export a symbol as hidden (visible at static-link time, but not for dynamic linking in the final .so) with global bar:function hidden
, but that's in the source file defining the function, not files accessing it.
global bar
bar:
mov eax,231
syscall
call bar wrt ..plt
call [rel bar wrt ..got]
extern bar
The 2nd file, after assembling with nasm -felf64
and disassembling with objdump -drwc -Mintel
to see the relocations:
0000000000000000 <.text>:
0: e8 00 00 00 00 call 0x5 1: R_X86_64_PLT32 bar-0x4
5: ff 15 00 00 00 00 call QWORD PTR [rip+0x0] # 0xb 7: R_X86_64_GOTPCREL bar-0x4
After linking with ld
(GNU Binutils) 2.35.1 - ld bar.o bar2.o -o bar
0000000000401000 <_start>:
401000: e8 0b 00 00 00 call 401010 <bar>
401005: ff 15 ed 1f 00 00 call QWORD PTR [rip+0x1fed] # 402ff8 <.got>
40100b: 0f 1f 44 00 00 nop DWORD PTR [rax+rax*1+0x0]
0000000000401010 <bar>:
401010: b8 e7 00 00 00 mov eax,0xe7
401015: 0f 05 syscall
Note that the PLT form got relaxed to just a direct call bar
, PLT eliminated. But the ff 15
call [rel mem] was not relaxed to an e8 rel32
With GAS:
_start:
call bar@plt
call *bar@GOTPCREL(%rip)
gcc -c foo.s && disas foo.o
0000000000000000 <_start>:
0: e8 00 00 00 00 call 5 <_start+0x5> 1: R_X86_64_PLT32 bar-0x4
5: ff 15 00 00 00 00 call QWORD PTR [rip+0x0] # b <_start+0xb> 7: R_X86_64_GOTPCRELX bar-0x4
Note the X at the end of R_X86_64_GOTPCRELX.
ld bar2.o foo.o -o bar && disas bar
:
0000000000401000 <bar>:
401000: b8 e7 00 00 00 mov eax,0xe7
401005: 0f 05 syscall
0000000000401007 <_start>:
401007: e8 f4 ff ff ff call 401000 <bar>
40100c: 67 e8 ee ff ff ff addr32 call 401000 <bar>
Both calls got relaxed to a direct e8
call rel32
straight to the target address. The extra byte in indirect call is filled with a 67
address-size prefix (which has no effect on call rel32
), padding the instruction to the same length. (Because it's too late to re-assemble and re-compute all relative branches within functions, and alignment and so on.)
That would happen for call *puts@GOTPCREL(%rip)
if you statically linked libc, with gcc -static
.