70
votes

GCC, MSVC, LLVM, and probably other toolchains have support for link-time (whole program) optimization to allow optimization of calls among compilation units.

Is there a reason not to enable this option when compiling production software?

7
See Why not always use compiler optimization?. The answers there are equally applicable here.Mankarse
@Mankarse He asks "when compiling production software" so most of the answers there doesn't apply.Ali
@user2485710: Do you have documentation for incompatibility with ld? What I read in the current gcc docs (gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html) and in a somewhat old wiki (gcc.gnu.org/wiki/LinkTimeOptimization) either says nothing about ld incompatibilities (gcc docs) or explicitly states compatibility (wiki). Judging from the mode of lto operation, namely having additional information in the object files, my guess would be that the object files maintain compatibility.Peter - Reinstate Monica
Enabling -O2 makes a difference of ca. +5 seconds on a 10 minute build here. Enabling LTO makes a difference of ca +3 minutes, and sometimes ld runs out of address space. This is a good reason to always compile with -O2 (so the executables that you debug are binary-identical with the ones you'll ship!) and not to use LTO until it is mature enough (which includes acceptable speed). Your mileage may vary.Damon
@Damon: The release build is not the build I've been debugging, but the build which survived testing. Test gets a separate build anyhow, installed on a clean machine (so I know the install package isn't missing any dependencies).MSalters

7 Answers

50
votes

I assume that by "production software" you mean software that you ship to the customers / goes into production. The answers at Why not always use compiler optimization? (kindly pointed out by Mankarse) mostly apply to situations in which you want to debug your code (so the software is still in the development phase -- not in production).

6 years have passed since I wrote this answer, and an update is necessary. Back in 2014, the issues were:

  • Link time optimization occasionally introduced subtle bugs, see for example Link-time optimization for the kernel. I assume this is less of an issue as of 2020. Safeguard against these kinds of compiler and linker bugs: Have appropriate tests to check the correctness of your software that you are about to ship.
  • Increased compile time. There are claims that the situation has significantly improved since 2014, for example thanks to slim objects.
  • Large memory usage. This post claims that the situation has drastically improved in recent years, thanks to partitioning.

As of 2020, I would try to use LTO by default on any of my projects.

11
votes

This recent question raises another possible (but rather specific) case in which LTO may have undesirable effects: if the code in question is instrumented for timing, and separate compilation units have been used to try to preserve the relative ordering of the instrumented and instrumenting statements, then LTO has a good chance of destroying the necessary ordering.

I did say it was specific.

6
votes

If you have well written code, it should only be advantageous. You may hit a compiler/linker bug, but this goes for all types of optimisation, this is rare.

Biggest downside is it drastically increases link time.

2
votes

Apart from to this,

Consider a typical example from embedded system,

void function1(void) { /*Do something*/} //located at address 0x1000 
void function2(void) { /*Do something*/} //located at address 0x1100
void function3(void) { /*Do something*/} //located at address 0x1200

With predefined addressed functions can be called through relative addresses like bellow,

 (*0x1000)(); //expected to call function2
 (*0x1100)(); //expected to call function2
 (*0x1200)();  //expected to call function3

LOT can lead to unexpected behavior.

1
votes

Given that the code is implemented correctly, then link time optimization should not have any impact on the functionality. However, there are scenarios where not 100% correct code will typically just work without link time optimization, but with link time optimization the incorrect code will stop working. There are similar situations when switching to higher optimization levels, like, from -O2 to -O3 with gcc.

That is, depending on your specific context (like, age of the code base, size of the code base, depth of tests, are you starting your project or are you close to final release, ...) you would have to judge the risk of such a change.

One scenario where link-time-optimization can lead to unexpected behavior for wrong code is the following:

Imagine you have two source files read.c and client.c which you compile into separate object files. In the file read.c there is a function read that does nothing else than reading from a specific memory address. The content at this address, however, should be marked as volatile, but unfortunately that was forgotten. From client.c the function read is called several times from the same function. Since read only performs one single read from the address and there is no optimization beyond the boundaries of the read function, read will always when called access the respective memory location. Consequently, every time when read is called from client.c, the code in client.c gets a freshly read value from the address, just as if volatile had been used.

Now, with link-time-optimization, the tiny function read from read.c is likely to be inlined whereever it is called from client.c. Due to the missing volatile, the compiler will now realize that the code reads several times from the same address, and may therefore optimize away the memory accesses. Consequently, the code starts to behave differently.

0
votes

LTO support is buggy and LTO related issues has lowest priority for compiler developers. For example: mingw-w64-x86_64-gcc-10.2.0-5 works fine with lto, mingw-w64-x86_64-gcc-10.2.0-6 segfauls with bogus address. We have just noticed that windows CI stopped working.

Please refer the following issue as an example.

0
votes

Rather than mandating that all implementations support the semantics necessary to accomplish all tasks, the Standard allows implementations intended to be suitable for various tasks to extend the language by defining semantics in corner cases beyond those mandated by the C Standard, in ways that would be useful for those tasks.

An extremely popular extension of this form is to specify that cross-module function calls will be processed in a fashion consistent with the platform's Application Binary Interface without regard for whether the C Standard would require such treatment.

Thus, if one makes a cross-module call to a function like:

uint32_t read_uint32_bits(void *p)
{
  return *(uint32_t*)p;
}

the generated code would read the bit pattern in a 32-bit chunk of storage at address p, and interpret it as a uint32_t value using the platform's native 32-bit integer format, without regard for how that chunk of storage came to hold that bit pattern. Likewise, if a compiler were given something like:

uint32_t read_uint32_bits(void *p);
uint32_t f1bits, f2bits;
void test(void)
{
  float f;
  f = 1.0f;
  f1bits = read_uint32_bits(&f);
  f = 2.0f;
  f2bits = read_uint32_bits(&f);
}

the compiler would reserve storage for f on the stack, store the bit pattern for 1.0f to that storage, call read_uint32_bits and store the returned value, store the bit pattern for 2.0f to that storage, call read_uint32_bits and store that returned value.

The Standard provides no syntax to indicate that the called function might read the storage whose address it receives using type uint32_t, nor to indicate that the pointer the function was given might have been written using type float, because implementations intended for low-level programming already extended the language to supported such semantics without using special syntax.

Unfortunately, adding in Link Time Optimization will break any code that relies upon that popular extension. Some people may view such code as broken, but if one recognizes the Spirit of C principle "Don't prevent programmers from doing what needs to be done", the Standard's failure to mandate support for a popular extension cannot be viewed as intending to deprecate its usage if the Standard fails to provide any reasonable alternative.