I often hear the terms 'statically linked' and 'dynamically linked', often in reference to code written in C, C++ or C#. What are they, what exactly are they talking about, and what are they linking?
5 Answers
There are (in most cases, discounting interpreted code) two stages in getting from source code (what you write) to executable code (what you run).
The first is compilation which turns source code into object modules.
The second, linking, is what combines object modules together to form an executable.
The distinction is made for, among other things, allowing third party libraries to be included in your executable without you seeing their source code (such as libraries for database access, network communications and graphical user interfaces), or for compiling code in different languages (C and assembly code for example) and then linking them all together.
When you statically link a file into an executable, the contents of that file are included at link time. In other words, the contents of the file are physically inserted into the executable that you will run.
When you link dynamically, a pointer to the file being linked in (the file name of the file, for example) is included in the executable and the contents of said file are not included at link time. It's only when you later run the executable that these dynamically linked files are bought in and they're only bought into the in-memory copy of the executable, not the one on disk.
It's basically a method of deferred linking. There's an even more deferred method (called late binding on some systems) that won't bring in the dynamically linked file until you actually try to call a function within it.
Statically-linked files are 'locked' to the executable at link time so they never change. A dynamically linked file referenced by an executable can change just by replacing the file on the disk.
This allows updates to functionality without having to re-link the code; the loader re-links every time you run it.
This is both good and bad - on one hand, it allows easier updates and bug fixes, on the other it can lead to programs ceasing to work if the updates are incompatible - this is sometimes responsible for the dreaded "DLL hell" that some people mention in that applications can be broken if you replace a dynamically linked library with one that's not compatible (developers who do this should expect to be hunted down and punished severely, by the way).
As an example, let's look at the case of a user compiling their main.c
file for static and dynamic linking.
Phase Static Dynamic
-------- ---------------------- ------------------------
+---------+ +---------+
| main.c | | main.c |
+---------+ +---------+
Compile........|.........................|...................
+---------+ +---------+ +---------+ +--------+
| main.o | | crtlib | | main.o | | crtimp |
+---------+ +---------+ +---------+ +--------+
Link...........|..........|..............|...........|.......
| | +-----------+
| | |
+---------+ | +---------+ +--------+
| main |-----+ | main | | crtdll |
+---------+ +---------+ +--------+
Load/Run.......|.........................|..........|........
+---------+ +---------+ |
| main in | | main in |-----+
| memory | | memory |
+---------+ +---------+
You can see in the static case that the main program and C runtime library are linked together at link time (by the developers). Since the user typically cannot re-link the executable, they're stuck with the behaviour of the library.
In the dynamic case, the main program is linked with the C runtime import library (something which declares what's in the dynamic library but doesn't actually define it). This allows the linker to link even though the actual code is missing.
Then, at runtime, the operating system loader does a late linking of the main program with the C runtime DLL (dynamic link library or shared library or other nomenclature).
The owner of the C runtime can drop in a new DLL at any time to provide updates or bug fixes. As stated earlier, this has both advantages and disadvantages.
I think a good answer to this question ought to explain what linking is.
When you compile some C code (for instance), it is translated to machine language. Just a sequence of bytes which, when run, causes the processor to add, subtract, compare, "goto", read memory, write memory, that sort of thing. This stuff is stored in object (.o) files.
Now, a long time ago, computer scientists invented this "subroutine" thing. Execute-this-chunk-of-code-and-return-here. It wasn't too long before they realised that the most useful subroutines could be stored in a special place and used by any program that needed them.
Now in the early days programmers would have to punch in the memory address that these subroutines were located at. Something like CALL 0x5A62
. This was tedious and problematic should those memory addresses ever need to be changed.
So, the process was automated. You write a program that calls printf()
, and the compiler doesn't know the memory address of printf
. So the compiler just writes CALL 0x0000
, and adds a note to the object file saying "must replace this 0x0000 with the memory location of printf".
Static linkage means that the linker program (the GNU one is called ld) adds printf
's machine code directly to your executable file, and changes the 0x0000 to the address of printf
. This happens when your executable is created.
Dynamic linkage means that the above step doesn't happen. The executable file still has a note that says "must replace 0x000 with the memory location of printf". The operating system's loader needs to find the printf code, load it into memory, and correct the CALL address, each time the program is run.
It's common for programs to call some functions which will be statically linked (standard library functions like printf
are usually statically linked) and other functions which are dynamically linked. The static ones "become part" of the executable and the dynamic ones "join in" when the executable is run.
There are advantages and disadvantages to both methods, and there are differences between operating systems. But since you didn't ask, I'll end this here.
Statically linked libraries are linked in at compile time. Dynamically linked libraries are loaded at run time. Static linking bakes the library bit into your executable. Dynamic linking only bakes in a reference to the library; the bits for the dynamic library exist elsewhere and could be swapped out later.
Because none of the above posts actually show how to statically link something and see that you did it correctly so I will address this issue:
A simple C program
#include <stdio.h>
int main(void)
{
printf("This is a string\n");
return 0;
}
Dynamically link the C program
gcc simpleprog.c -o simpleprog
And run file
on the binary:
file simpleprog
And that will show it is dynamically linked something along the lines of:
"simpleprog: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.26, BuildID[sha1]=0xf715572611a8b04f686809d90d1c0d75c6028f0f, not stripped"
Instead let us statically link the program this time:
gcc simpleprog.c -static -o simpleprog
Running file on this statically linked binary will show:
file simpleprog
"simpleprog: ELF 64-bit LSB executable, x86-64, version 1 (GNU/Linux), statically linked, for GNU/Linux 2.6.26, BuildID[sha1]=0x8c0b12250801c5a7c7434647b7dc65a644d6132b, not stripped"
And you can see it is happily statically linked. Sadly however not all libraries are simple to statically link this way and may require extended effort using libtool
or linking the object code and C libraries by hand.
Luckily many embedded C libraries like musl
offer static linking options for nearly all if not all of their libraries.
Now strace
the binary you have created and you can see that there are no libraries accessed before the program begins:
strace ./simpleprog
Now compare with the output of strace
on the dynamically linked program and you will see that the statically linked version's strace is much shorter!
(I don't know C# but it is interesting to have a static linking concept for a VM language)
Dynamic linking involves knowing how to find a required functionality which you only have a reference from your program. You language runtime or OS search for a piece of code on the filesystem, network or compiled code cache, matching the reference, and then takes several measures to integrate it to your program image in the memory, like relocation. They are all done at runtime. It can be done either manually or by the compiler. There is ability to update with a risk of messing up (namely, DLL hell).
Static linking is done at compile time that, you tell the compiler where all the functional parts are and instruct it to integrate them. There are no searching, no ambiguity, no ability to update without a recompile. All your dependencies are physically one with your program image.