0
votes

I'm investigating the topic of shared libraries. The way I understood it, when linking a source file with a shared library to form an executable, unresolved symbols will remain unresolved until their first call, then lazy binding will resolve them. Based on that, I assumed that using a function that wasn't defined anywhere won't throw linker error, as it will leave the resolving job to the dynamic linker. But when I typed the following commands in the terminal:

gcc -c foo.c -fPIC
gcc -shared foo.o -o libfoos.so
gcc  main.c -Wl,-rpath=. libfoos.so

I got an "undefined reference to 'foo2' " error.

This was all done with the following files in the same directory:

foo.h:

#ifndef __FOO_H__
#define __FOO_H__

int foo(int num);

#endif /* __FOO_H__ */

main.c:

#include <stdio.h>

#include "foo.h"

int main()
{
    int a = 5;
    printf("%d * %d = %d\n", a, a, foo(a));
    printf("%d + %d = %d\n", a, a, foo2(a));
    
    return (0);
}

and foo.c:

#include "foo.h"

int foo(int num)
{
    return (num * num);
}

So my questions are:

  1. Is it true that symbols remain unresolved until they are called for the first time? If so, then how come I'm getting an error at linking time?
  2. I'm guessing that maybe some check needs to be made as for the very existence of the symbols (foo and foo2 my example) in the shared library, already at linking time. If so, then why not resolving them already at the same time, since we're accessing some information in the library anyway?

Thanks!

1

1 Answers

0
votes
  1. Is it true that symbols remain unresolved until they are called for the first time?

I think you may be confusing the requirements and semantics of the source language (C) with the execution semantics of dynamic shared object formats and implementations, such as ELF.

The C language does not specify when symbols are resolved, only that there must be a definition for each identifier that is used to access an object or call a function.

Different DSO formats have different properties in and around this. With ELF, for example, resolution of dynamic symbols can be deferred until the symbol is first referenced, or it can be performed immediately upon loading the DSO. This is configurable both at runtime and at compile time. The semantics of other DSO formats may be different in this and other regards.

Bottom line: no, it is not necessarily true that dynamic symbols are resolved only when they are first referenced, but that might be the default for your particular implementation and environment.

If so, then how come I'm getting an error at linking time?

The linker is checking the C language requirements at build time. It is perfectly reasonable and in fact desirable for it to do so even when building shared objects, for if there is an unresolvable symbol used then one would like to know about the problem and fix it before people try to use the program. This is not related to whether dynamic symbol resolution is deferred at runtime.

  1. I'm guessing that maybe some check needs to be made as for the very existence of the symbols (foo and foo2 my example) in the shared library, already at linking time.

Yes, that's basically it.

If so, then why not resolving them already at the same time, since we're accessing some information in the library anyway?

How do you know that doesn't happen?

In a DSO system that does not feature symbol relocation, that can be done and is done. The dynamism in such a DSO system is primarily in whether a given library is loaded at all. DSOs in such a system have fixed load addresses and the symbols exported from them also have fixed addresses. This allows executables to be (much) smaller and for system memory to be used (much) more efficiently, relative to statically-linked executables.

But there are big practical problems with such an approach. For example, you have to contend with address-space collisions between different DSOs, updating DSOs is difficult and risky, and having well-known addresses is a security risk. Therefore, most modern DSO systems feature symbol relocation. In such a system, DSOs' load addresses are determined dynamically, at runtime, and typically even the relative offsets represented by their exported symbols are not fixed. This is the kind of DSO system that supports deferred symbol resolution, and with such a system, symbols from other DSOs cannot be resolved at build time because they are not known until run time, and they might even vary from run to run.