1
votes

In some architectures (like x86_64) where it's possible to reference data (mov e.g.) and code (jmp, call) using PC(RIP)-relative addressing mode, is there really a technical reason justifying the need of such structures (got, plt) ?

I mean, if I want to mov a global data (for instance) to a register, I could make the following instruction (standard PIE) :

mov rax,QWORD PTR [rip+0x2009db]

mov eax,DWORD PTR [rax]

(where 0x2009db is the offset between rip and the right entry in the got containing the symbol address)

And why couldn't we do something like that :

mov rax, rip+0xYYYYYY

mov eax,DWORD PTR [rax]

(0xYYYYYYY being the direct delta between the RIP value and the symbol (a global variable e.g.))

I'm not used to do ASM, so my example is perhaps false. But my idea is : why not just simply compute the absolute address of the symbol based on RIP, put it in EAX, and then access its content. If the instruction set allows to do whatever we want with relative addressing, why use such structures (got, plt) ?

The same question would apply for call/jmp instructions.

Is it because the instruction set does not allow it ? Is it because the offset value cannot cover the entire address space ? But.. is it important ? Since the structure of a section is maintained one mapped into the virtual addressing space of a process (e.g. .dat section followed by .got or something like). I mean, why would the offset be bigger in referring directly to the symbol address instead of the entry address in the got ? Other reason ?

Thanks !

1

1 Answers

0
votes

Basically, the reason for these structures is exactly having an extra level of indirection.

In this way, you can interpose symbols in dynamic libraries with LD_PRELOAD. And even without it, the dynamic binding rules are such that a symbol defined in an executable overrides one defined in a shared library even for calls from that library (see this).

Also, consider these points.

  1. The address at which a shared library holding the implementation of the called function gets loaded is not known beforehand (this is by design, in particular it is done on purpose: it's a feature of ld.so known as ASLR), so the dynamic loader needs to apply relocations to at least all call sites that are executed at run time.
  2. If not for PLT, this would kill the advantage of sharing code segments of libraries mapped to memory by different process images, because in different processes the same library might have a different address, leading to different patched code. PLT is a relatively small piece of data that is not shared. See this post.
  3. PLT allows to bind functions lazily, upon first invocation. PLT slots initially hold the address of the resolver. After the resolution is done, the result is cached in the PLT stot.

Relocation mechanism for GOT/PLT is covered here. All in all, there's enough information on the internet on how (and why) PLT and GOT work.

Also, check out GCC's -fno-plt option. This is an optimization, but note that GOT is still needed and that lazy binding is not supported for functions without PLT entries.