1
votes

I have an object file of a C program which prints hello world, just for the question. I am trying to understand using readelf utility or gdb or hexedit(I can't figure which tool is a correct one) where in the file does the code of function "main" starts.

I know using readelf that symbol _start & main occurs and the address where it is mapped in a virtual memory. Moreover, I also know what the size of .text section and the of coruse where entry point specified, i.e the address which the same of text section.

The question is - Where in the file does the code of function "main" starts? I tought that is the entry point and the offset of the text section but how I understand it the sections data, bss, rodata should be ran before main and it appears after section text in readelf.

Also I tought we should sum the size all the lines till main in symbol table, but I am not sure at all if it is correct.

Additional question which follow up this one is if I want to replace main function with NOP instrcutres or plant one ret instruction in my object file. how can I know the offset where I can do it using hexedit.

2
You can just look at the section headers. Find which section contains the symbol's virtual address (should be .text) then subtract the start address of that section, finally add the file offset. - Jester
@Jester I am not sure I understand the formula. my addr. of .text section is the same as entry point, (for ex. 0x80482e0 and the offset is 0002e0 in section headers of .text section. Then you meant subtract it from entry point which the same, we get 0 and add offset, we get 2e0. Is it correct or I miss something? - John D
Yes, that is just a happy little accident (TM) because your symbol is right at the beginning of a section so you can just use the offset directly. - Jester
It's irrelevant what order the sections are in memory, or in the file. All of the loadable sections (segments actually) are mapped before the code begins to execute at the given entry point. Historically you want the .data and .bss at the end of your memory layout so you can grow your heap using brk. - Jester
Use objdump with the disassemble option, and maybe other options, too. - Erik Eidt

2 Answers

3
votes

So, let's go through it step by step.

Start with this C file:

#include <stdio.h>

void printit()
{
    puts("Hello world!");
}

int main(void)
{
    printit();
    return 0;
}

As the comments look like you are on x86, compile it as 32-bit non-PIE executable like this:

$ gcc -m32 -no-pie  -o test test.c

The -m32 option is needed, because I am working at a x86-64 machine. As you already know, you can get the virtual memory address of main using readelf, objdump or nm, for example like this:

$ nm test | grep -w main
0804918d T main

Obviously, 804918d can not be an offset in the file that is just 15 kB big. You need to find the mapping between virtual memory addresses and file offsets. In a typical ELF file, the mapping is included twice. Once in a detailed form for linkers (as object files are also ELF files) and debuggers, and a second time in a condensed form that is used by the kernel for loading programs. The detailed form is the list of sections, consisting of section headers, and you can view it like this (the output is shortened a bit, to make the answer more readable):

$ readelf --section-headers test
There are 29 section headers, starting at offset 0x3748:

Section Headers:
  [Nr] Name              Type            Addr     Off    Size   ES Flg Lk Inf Al
[...]
  [11] .init             PROGBITS        08049000 001000 000020 00  AX  0   0  4
  [12] .plt              PROGBITS        08049020 001020 000030 04  AX  0   0 16
  [13] .text             PROGBITS        08049050 001050 0001c1 00  AX  0   0 16
  [14] .fini             PROGBITS        08049214 001214 000014 00  AX  0   0  4
  [15] .rodata           PROGBITS        0804a000 002000 000015 00   A  0   0  4
[...]
Key to Flags:
  W (write), A (alloc), X (execute), M (merge), S (strings), I (info),
  L (link order), O (extra OS processing required), G (group), T (TLS),
  C (compressed), x (unknown), o (OS specific), E (exclude),
  p (processor specific)

Here you find that the .text section starts at (virtual) address 08049050 and has a size of 1c1 bytes, so it ends at address 08049211. The address of main, 804918d is in this range, so you know main is a member of the text section. If you subtract the base of the text section from the address of main, you find that main is 13d bytes into the text section. The section listing also contains the file offset where the data for the text section starts. It's 1050, so the first byte of main is at offset 0x1050 + 0x13d == 0x118d.

You can do the same calculation using program headers:

$ readelf --program-headers test
[...]
Program Headers:
  Type           Offset   VirtAddr   PhysAddr   FileSiz MemSiz  Flg Align
  PHDR           0x000034 0x08048034 0x08048034 0x00160 0x00160 R   0x4
  INTERP         0x000194 0x08048194 0x08048194 0x00013 0x00013 R   0x1
      [Requesting program interpreter: /lib/ld-linux.so.2]
  LOAD           0x000000 0x08048000 0x08048000 0x002e8 0x002e8 R   0x1000
  LOAD           0x001000 0x08049000 0x08049000 0x00228 0x00228 R E 0x1000
  LOAD           0x002000 0x0804a000 0x0804a000 0x0019c 0x0019c R   0x1000
  LOAD           0x002f0c 0x0804bf0c 0x0804bf0c 0x00110 0x00114 RW  0x1000
[...]

The second load line tells you that the area 08049000 (VirtAddr) to 08049228 (VirtAddr + MemSiz) is readable and executable, and loaded from offset 1000 in the file. So again you can calculate that the address of main is 18d bytes into this load area, so it has to reside at offset 0x118d inside the executable. Let's test that:

$ ./test
Hello world!
$ echo -ne '\xc3' | dd of=test conv=notrunc bs=1 count=1 seek=$((0x118d))
1+0 records in
1+0 records out
1 byte copied, 0.0116672 s, 0.1 kB/s
$ ./test
$

Overwriting the first byte of main with 0xc3, the opcode for return (near) on x86, causes the program to not output anything anymore.

2
votes

_start normally belongs to a module ( a *.o file) that is fixed (it is called differently on different systems, but a common name is crt0.o which is written in assembler.) That fixed code prepares the stack (normally the arguments and the environment are stored in the initial stack segment by the execve(2) system call) the mission of crt0.s is to prepare the initial C stack frame and call main(). Once main() ends, it is responsible of getting the return value from main and calling all the atexit() handlers to finish calling the _exit(2) system call.

The linking of crt0.o is normally transparent due to the fact that you always call the compiler to do the linking itself, so you normally don't have to add crt0.o as the first object module, but the compiler knows (lately, all this stuff has grown considerably, since we depend on architecture and ABIs to pass parameters between functions)

If you execute the compiler with the -v option, you'll get the exact command line it uses to call the linker and you'll get the secrets of the final memory map your program has on its first stages.