496
votes

I have read about GCC's Options for Code Generation Conventions, but could not understand what "Generate position-independent code (PIC)" does. Please give an example to explain me what does it mean.

6

6 Answers

603
votes

Position Independent Code means that the generated machine code is not dependent on being located at a specific address in order to work.

E.g. jumps would be generated as relative rather than absolute.

Pseudo-assembly:

PIC: This would work whether the code was at address 100 or 1000

100: COMPARE REG1, REG2
101: JUMP_IF_EQUAL CURRENT+10
...
111: NOP

Non-PIC: This will only work if the code is at address 100

100: COMPARE REG1, REG2
101: JUMP_IF_EQUAL 111
...
111: NOP

EDIT: In response to comment.

If your code is compiled with -fPIC, it's suitable for inclusion in a library - the library must be able to be relocated from its preferred location in memory to another address, there could be another already loaded library at the address your library prefers.

64
votes

I'll try to explain what has already been said in a simpler way.

Whenever a shared lib is loaded, the loader (the code on the OS which load any program you run) changes some addresses in the code depending on where the object was loaded to.

In the above example, the "111" in the non-PIC code is written by the loader the first time it was loaded.

For not shared objects, you may want it to be like that because the compiler can make some optimizations on that code.

For shared object, if another process will want to "link" to that code he must read it to the same virtual addresses or the "111" will make no sense. but that virtual-space may already be in use in the second process.

50
votes

Code that is built into shared libraries should normally be position-independent code, so that the shared library can readily be loaded at (more or less) any address in memory. The -fPIC option ensures that GCC produces such code.

23
votes

Adding further...

Every process has same virtual address space (If randomization of virtual address is stopped by using a flag in linux OS) (For more details Disable and re-enable address space layout randomization only for myself)

So if its one exe with no shared linking (Hypothetical scenario), then we can always give same virtual address to same asm instruction without any harm.

But when we want to link shared object to the exe, then we are not sure of the start address assigned to shared object as it will depend upon the order the shared objects were linked.That being said, asm instruction inside .so will always have different virtual address depending upon the process its linking to.

So one process can give start address to .so as 0x45678910 in its own virtual space and other process at the same time can give start address of 0x12131415 and if they do not use relative addressing, .so will not work at all.

So they always have to use the relative addressing mode and hence fpic option.

20
votes

The link to a function in a dynamic library is resolved when the library is loaded or at run time. Therefore, both the executable file and dynamic library are loaded into memory when the program is run. The memory address at which a dynamic library is loaded cannot be determined in advance, because a fixed address might clash with another dynamic library requiring the same address.


There are two commonly used methods for dealing with this problem:

1.Relocation. All pointers and addresses in the code are modified, if necessary, to fit the actual load address. Relocation is done by the linker and the loader.

2.Position-independent code. All addresses in the code are relative to the current position. Shared objects in Unix-like systems use position-independent code by default. This is less efficient than relocation if program run for a long time, especially in 32-bit mode.


The name "position-independent code" actually implies following:

  • The code section contains no absolute addresses that need relocation, but only self relative addresses. Therefore, the code section can be loaded at an arbitrary memory address and shared between multiple processes.

  • The data section is not shared between multiple processes because it often contains writeable data. Therefore, the data section may contain pointers or addresses that need relocation.

  • All public functions and public data can be overridden in Linux. If a function in the main executable has the same name as a function in a shared object, then the version in main will take precedence, not only when called from main, but also when called from the shared object. Likewise, when a global variable in main has the same name as a global variable in the shared object, then the instance in main will be used, even when accessed from the shared object.


This so-called symbol interposition is intended to mimic the behavior of static libraries.

A shared object has a table of pointers to its functions, called procedure linkage table (PLT) and a table of pointers to its variables called global offset table (GOT) in order to implement this "override" feature. All accesses to functions and public variables go through this tables.

p.s. Where dynamic linking cannot be avoided, there are various ways to avoid the timeconsuming features of the position-independent code.

You can read more from this article: http://www.agner.org/optimize/optimizing_cpp.pdf

10
votes

A minor addition to the answers already posted: object files not compiled to be position independent are relocatable; they contain relocation table entries.

These entries allow the loader (that bit of code that loads a program into memory) to rewrite the absolute addresses to adjust for the actual load address in the virtual address space.

An operating system will try to share a single copy of a "shared object library" loaded into memory with all the programs that are linked to that same shared object library.

Since the code address space (unlike sections of the data space) need not be contiguous, and because most programs that link to a specific library have a fairly fixed library dependency tree, this succeeds most of the time. In those rare cases where there is a discrepancy, yes, it may be necessary to have two or more copies of a shared object library in memory.

Obviously, any attempt to randomize the load address of a library between programs and/or program instances (so as to reduce the possibility of creating an exploitable pattern) will make such cases common, not rare, so where a system has enabled this capability, one should make every attempt to compile all shared object libraries to be position independent.

Since calls into these libraries from the body of the main program will also be made relocatable, this makes it much less likely that a shared library will have to be copied.