2
votes

I wrote this Hello World in C :

#include<stdio.h>

int main() {
  printf("Hello world !\n");
  return 0;
}

Compiling with gcc to assembly code I get this :

    .file   "file.c"
    .section    .rodata.str1.1,"aMS",@progbits,1
.LC0:
    .string "Hello world !"
    .section    .text.unlikely,"ax",@progbits
.LCOLDB1:
    .section    .text.startup,"ax",@progbits
.LHOTB1:
    .p2align 4,,15
    .globl  main
    .type   main, @function
main:
.LFB11:
    .cfi_startproc
    subq    $8, %rsp
    .cfi_def_cfa_offset 16
    movl    $.LC0, %edi
    call    puts
    xorl    %eax, %eax
    addq    $8, %rsp
    .cfi_def_cfa_offset 8
    ret
    .cfi_endproc
.LFE11:
    .size   main, .-main
    .section    .text.unlikely
.LCOLDE1:
    .section    .text.startup
.LHOTE1:
    .ident  "GCC: (GNU) 4.9.2 20150304 (prerelease)"
    .section    .note.GNU-stack,"",@progbits

No problem here. But now, I want to compare the assembly code with a code disassembled by objdump :

For the main function I get this :

0000000000000000 <main>:
   0:   48 83 ec 08             sub    $0x8,%rsp
   4:   bf 00 00 00 00          mov    $0x0,%edi
            5: R_X86_64_32  .rodata.str1.1
   9:   e8 00 00 00 00          callq  e <main+0xe>
            a: R_X86_64_PC32    puts-0x4
   e:   31 c0                   xor    %eax,%eax
  10:   48 83 c4 08             add    $0x8,%rsp
  14:   c3                      retq   

I don't understand two things :

Why move the number 0 on edi means to load the string "Hello world" ?

Moreover, the instruction callq call the address e. But the instruction at the addresse e is not the function puts but a xor. So what is the real address ?

1
That happens because of relocation, and here is a tutorial I've written on it: stackoverflow.com/a/30507725/895245Ciro Santilli Путлер Капут 六四事

1 Answers

2
votes

The answer is that there are various fixups applied by the linker. When I do an objdump -d hello.o I get this:

Disassembly of section .text:

0000000000000000 <main>:
   0:   55                      push   %rbp
   1:   48 89 e5                mov    %rsp,%rbp
   4:   bf 00 00 00 00          mov    $0x0,%edi
   9:   e8 00 00 00 00          callq  e <main+0xe>
   e:   b8 00 00 00 00          mov    $0x0,%eax
  13:   5d                      pop    %rbp
  14:   c3                      retq   

However, an extract from objdump -d hello yields this:

400536: 55                      push   %rbp
400537: 48 89 e5                mov    %rsp,%rbp
40053a: bf e0 05 40 00          mov    $0x4005e0,%edi
40053f: e8 cc fe ff ff          callq  400410 <puts@plt>
400544: b8 00 00 00 00          mov    $0x0,%eax
400549: 5d                      pop    %rbp
40054a: c3                      retq   

The difference is that the zeroes for the string offset and the address of puts are now actually filled in by the linker. You can find those relocation entries with objdump -r hello.o

hello.o:     file format elf64-x86-64

RELOCATION RECORDS FOR [.text]:
OFFSET           TYPE              VALUE 
0000000000000005 R_X86_64_32       .rodata
000000000000000a R_X86_64_PC32     puts-0x0000000000000004

What that says is that the linker finds the actual address of .rodata (which is the address of the string) and puts it at offset 0x5 and the address of the library puts code and places it at offset 0xa.

This article on relocation describes the process in more detail and correctly points out that while some relocation happens at link-time, the loader can also supply relocation data.