5
votes

I am having a problem about a wrong symbol resolution. My main program loads a shared library with dlopen and a symbol from it with dlsym. Both the program and the library are written in C. Library code

int a(int b)
{
  return b+1;
}

int c(int d)
{
  return a(d)+1;
}

In order to make it work on a 64-bit machine, -fPIC is passed to gcc when compiling.

The program is:

#include <dlfcn.h>
#include <stdio.h>

int (*a)(int b);
int (*c)(int d);

int main()
{
  void* lib=dlopen("./libtest.so",RTLD_LAZY);
  a=dlsym(lib,"a");
  c=dlsym(lib,"c");
  int d = c(6);
  int b = a(5);
  printf("b is %d d is %d\n",b,d);
  return 0;
}

Everything runs fine if the program is NOT compiled with -fPIC, but it crashes with a segmentation fault when the program is compiled with -fPIC. Investigation led to discover that the crash is due to the wrong resolution of symbol a. The crash occurs when a is called, no matter whether from the library or the main program (the latter is obtained by commenting out the line calling c() in the main program).

No problems occur when calling c() itself, probably because c() is not called internally by the library itself, while a() is both a function used internally by the library and an API function of the library.

A simple workaround is not use -fPIC when compiling the program. But this is not always possible, for example when the code of the main program has to be in a shared library itself. Another workaround is to rename the pointer to function a to something else. But I cannot find any real solution.

Replacing RTLD_LAZY with RTLD_NOW does not help.

3
Please show us the compile lines you used, as well as your compiler version. - robert
I suggest not naming a global pointer to function with the same name as the dlsym-ed function it points to. Or just make your pointer to functions local or static variables, or data fields. - Basile Starynkevitch
Thinking more about it, it seems that, since it was not otherwise specified, the main program also exports symbols a and c externally. So symbol a is doubly defined (by the main program and by the shared object) and the dynamic linker finds the wrong one. Using the gcc-specific attribute ((visibility ("hidden"))) in the main program is maybe the right thing to do... any advice? - user377486
gcc version 4.6.3 (Ubuntu/Linaro 4.6.3-1ubuntu5) Object files use the default Makefile rule: $(CC) $(CFLAGS) $(CPPFLAGS) $(TARGET_ARCH) -c $< To link the library gcc -shared -o $@ $^ To link the executable gcc -o $@ $^ -g -ldl CFLAGS=-g or CFLAGS='-g -fPIC' is added to the command line - user377486
@user377486: The best advice is not to use the same names to begin with. It's undefined behavior. If you want the names to seem the same, you could do int (*a_ptr)(int b); and #define a a_ptr (but this seems really ugly for a name like a...), or you could just make it to the function pointer doesn't have external linkage. - R.. GitHub STOP HELPING ICE

3 Answers

3
votes

I suspect that there is a clash between two global symbols. One solution is to declare a in the main program as static. Alternatively, the linux manpage mentions RTLD_DEEPBIND flag, a linux-only extension, which you can pass to dlopen and which will cause library to prefer its own symbols over global symbols.

0
votes

It seems this issue can take place in one more case (like for me). I have a program and a couple of a dynamically linked libs. And when I tried to add one more I used a function from a static lib (my too) in it. And I forgot to add to linkage list this static lib. Linker was not warn me about this, but program was crushing with segmentation fault error.

Maybe this will help for someone.

0
votes

FWIW, I ran into a similar problem when compiling as C++ and forgetting about name mangling. A solution there is to use extern "C".