20
votes

I am writing a simple user-space ELF loader under Linux (why? for 'fun'). My loader at the moment is quite simple and is designed to load only statically-linked ELF files containing position-independent code.

Normally, when a program is loaded by the kernel's ELF loader, it is loaded into its own address space. As such, the data segment and code segment can be loaded at the correct virtual address as specified in the ELF segments.

In my case, however, I am requesting addresses from the kernel via mmap, and may or may not get the addresses requested in the ELF segments. This is not a problem for the code segment since it is position independent. However, if the data segment is not loaded at the expected address, code will not be able to properly reference anything stored in the data segment.

Indeed, my loader appears to work fine with a simple assembly executable that does not contain any data. But as soon as I add a data segment and reference it, the executable fails to run correctly or SEGFAULTs.

How, if possible, can I fixup any references to the data segment to point to the correct place? Is there a relocation section stored in the (static) ELF file for this purpose?

2
Is the reason mmap() is failing when giving the requested addresses due to the process' which is calling mmap already having those pages allocated in it's address space?Sean A.O. Harney
Yes, this is probably the reason. I thought about asking ld to place my code/data somewhere "out of the way", but I was wondering if a generic solution was possible first. If I don't get any responses here I might go ahead and try that.John Ledbetter

2 Answers

8
votes

If you modify the absolute addresses available in the .got section, (global offset table) your program should work. Make sure to modify the absolute address calculation to cater for the new distance between .text and .data, I'm afraid you need to figure out where this information comes from, for your architecture.

See this: Global Offset Table (Processor-Specific)

Good luck.

4
votes

I don't see any way you can do that, unless you emulate the kernel-provided virtual address space completely, and run the code inside that virtual space. When you mmap the data section from the file, you are intrinsically relocating it to an unknown address of the virtual address space of your ELF interpreter, and your code will not be able to reference to it in any way.

Glad to be proven wrong. There's something very cool to learn here.