4
votes

I recently discovered the LLVM's linker, lld that was praised for very fast linking. Indeed, I tested it and the results were awesome, the linking time in my case being reduced dramatically comparing to gold.

However, when speaking about link-time optimization, my knowledge is limited. As far as I understood by reading stuff on the internet, there is some extra-code produced in the object files, representing some internal compiler structures which is then used in the linking stage. Thus, my concern is if the link-time optimization (and it's benefits) is affected by this compiler/linker mix. I would appreciate some explanation on the matter!

I used gcc version 9.2.0 and lld version 10.0.0.

Command I used for generating object files:

/opt/gcc/9.2.0/bin/c++ -fPIE -flto -ffat-lto-objects -fuse-linker-plugin -m64 -O3 -g -DNDEBUG -o my_object.cpp.o -c my_source_file.cpp

For linking:

#-fuse-ld=gold
/opt/gcc/9.2.0/bin/c++ -fPIE -flto -ffat-lto-objects -fuse-linker-plugin -m64 -pie -fuse-ld=gold -Wl,-z,relro -Wl,-z,now -Wl,--as-needed -static-libstdc++ -static-libgcc -Wl,--threads -Wl,--thread-count,1
#-fuse-ld=lld
/opt/gcc/9.2.0/bin/c++ -fPIE -flto -ffat-lto-objects -fuse-linker-plugin -m64 -pie -fuse-ld=lld -Wl,-z,relro -Wl,-z,now -Wl,--as-needed -static-libstdc++ -static-libgcc -Wl,--threads -Wl,
1

1 Answers

4
votes

I did some research and finally concluded for myself that no LTO is done if we use lld when compiling with gcc. What I did:

Based on this somewhat vague presentation: https://www.slideshare.net/chimerawang/gcc-lto, I found that the linker is not directly doing the optimization, but rather, after reading all the symbols from all the object files, he passes the info to the lto-wrapper who then does the optimization through some other processes. So I made a test using a hello-world cpp file, compiling it with the -v flag and indeed I saw the succession of calls as earlier mentioned (collect2 (linker) -> lto-wrapper -> lto1). But this when using the default linker or the gold linker. When I used the -fuse-ld=lld flag, only the collect2 process was called. And this first thing made me believe that LTO was not done at all.

But hey, maybe the lld linker internalized the LTO process so it is done without calling any other process. So I made another test to see if LTO is done (based on this article). Basically from one cpp file I call for 100 000 000 times a function that's defined in other cpp file, a function which does nothing. Using basic -O2 optimization, the resulted binary runs in ~200ms, as the compiler is not able to optimize out the useless function calls. When using also the -flto flag and either ld or gold linker, the resulted binary runs in ~2 ms. But when using the lld linker, the resulted binary also runs in ~200ms. So lld with lto runs as slow as lld without lto. No sign of optimization whatsoever.
To be mentioned here that using the lld linker, the link command would fail if the objects would not be compiled using -ffat-lto-objects. This flag makes the object files larger because the compiler dumps not only the lto code, but also the code that can be linked without lto.

So, considering the time performance of the binary linked with lld and also the fact that objects need to be compiled with -ffat-lto-objects, I concluded that when the lld linker is used, LTO is not achieved at all, but lld uses the non-LTO code generated by the compiler in order to link the binary.