68
votes

C++ guarantees that variables in a compilation unit (.cpp file) are initialised in order of declaration. For number of compilation units this rule works for each one separately (I mean static variables outside of classes).

But, the order of initialization of variables, is undefined across different compilation units.

Where can I see some explanations about this order for gcc and MSVC (I know that relying on that is a very bad idea - it is just to understand the problems that we may have with legacy code when moving to new GCC major and different OS)?

7

7 Answers

76
votes

As you say the order is undefined across different compilation units.

Within the same compilation unit the order is well defined: The same order as definition.

This is because this is not resolved at the language level but at the linker level. So you really need to check out the linker documentation. Though I really doubt this will help in any useful way.

For gcc: Check out ld

I have found that even changing the order of objects files being linked can change the initialization order. So it is not just your linker that you need to worry about, but how the linker is invoked by your build system. Even try to solve the problem is practically a non starter.

This is generally only a problem when initializing globals that reference each other during their own initialization (so only affects objects with constructors).

There are techniques to get around the problem.

  • Lazy initialization.
  • Schwarz Counter
  • Put all complex global variables inside the same compilation unit.

  • Note 1: globals:
    Used loosely to refer to static storage duration variables that are potentially initialized before main().
  • Note 2: Potentially
    In the general case we expect static storage duration variables to be initialized before main, but the compiler is allowed to defer initialization in some situations (the rules are complex see standard for details).
19
votes

I expect the constructor order between modules is mainly a function of what order you pass the objects to the linker.

However, GCC does let you use init_priority to explicitly specify the ordering for global ctors:

class Thingy
{
public:
    Thingy(char*p) {printf(p);}
};

Thingy a("A");
Thingy b("B");
Thingy c("C");

outputs 'ABC' as you'd expect, but

Thingy a __attribute__((init_priority(300))) ("A");
Thingy b __attribute__((init_priority(200))) ("B");
Thingy c __attribute__((init_priority(400))) ("C");

outputs 'BAC'.

17
votes

Since you already know that you shouldn't rely on this information unless absolutely necessary, here it comes. My general observation across various toolchains (MSVC, gcc/ld, clang/llvm, etc) is that the order in which your object files are passed to the linker is the order in which they will be initialized.

There are exceptions to this, and I do not claim to all of them, but here are the ones I ran into myself:

1) GCC versions prior to 4.7 actually initialize in the reverse order of the link line. This ticket in GCC is when the change happened, and it broke a lot of programs that depended on initialization order (including mine!).

2) In GCC and Clang, usage of constructor function priority can alter the initialization order. Note that this only applies to functions that are declared to be "constructors" (i.e. they should be run just like a global object constructor would be). I have tried using priorities like this and found that even with highest priority on a constructor function, all constructors without priority (e.g. normal global objects, constructor functions without priority) will be initialized first. In other words, the priority is only relative to other functions with priorities, but the real first class citizens are those without priority. To make it worse, this rule is effectively the opposite in GCC prior to 4.7 due to point (1) above.

3) On Windows, there is a very neat and useful shared-library (DLL) entry-point function called DllMain(), which if defined, will run with parameter "fdwReason" equal to DLL_PROCESS_ATTACH directly after all global data has been initialized and before the consuming application has a chance to call any functions on the DLL. This is extremely useful in some cases, and there absolutely is not analogous behavior to this on other platforms with GCC or Clang with C or C++. The closest you will find is making a constructor function with priority (see above point (2)), which absolutely is not the same thing and won't work for many of the use cases that DllMain() works for.

4) If you are using CMake to generate your build systems, which I often do, I have found that the order of the input source files will be the order of their resultant object files given to the linker. However, often times your application/DLL is also linking in other libraries, in which case those libraries will be on the link line after your input source files. If you are looking to have one of your global objects be the very first one to initialize, then you are in luck and your can put the source file containing that object to be the first in the list of source files. However, if you are looking to have one be the very last one to initialize (which can effectively replicate DllMain() behavior!) then you can make a call to add_library() with that one source file to produce a static library, and add the resulting static library as the very last link dependency in your target_link_libraries() call for your application/DLL. Be wary that your global object may get optimized out in this case and you can use the --whole-archive flag to force the linker not to remove unused symbols for that specific tiny archive file.

Closing Tip

To absolutely know the resulting initialization order of your linked application/shared-library, pass --print-map to ld linker and grep for .init_array (or in GCC prior to 4.7, grep for .ctors). Every global constructor will be printed in the order that it will get initialized, and remember that the order is opposite in GCC prior to 4.7 (see point (1) above).

The motivating factor for writing this answer is that I needed to know this information, had no other choice but to rely on initialization order, and found only sparse bits of this information throughout other SO posts and internet forums. Most of it was learned through much experimentation, and I hope that this saves some people the time of doing that!

4
votes

http://www.parashift.com/c++-faq-lite/ctors.html#faq-10.12 - this link moves around. this one is more stable but you will have to look around for it.

edit: osgx supplied a better link.

1
votes

In addition to Martin's comments, coming from a C background, I always think of static variables as part of the program executable, incorporated and allocated space in the data segment. Thus static variables can be thought of as being initialised as the program loads, prior to any code being executed. The exact order in which this happens can be ascertained by looking at the data segment of map file output by the linker, but for most intents and purposes the initialisation is simultaeneous.

Edit: Depending on construction order of static objects is liable to be non-portable and should probably be avoided.

1
votes

A robust solution is to use a getter function that returns a reference to an static variable. A simple example is shown below, a complex variant in our SDG Controller middleware.

// Foo.h
class Foo {
 public:
  Foo() {}

  static bool insertIntoBar(int number);

 private:
  static std::vector<int>& getBar();
};

// Foo.cpp
std::vector<int>& Foo::getBar() {
  static std::vector<int> bar;
  return bar;
}

bool Foo::insertIntoBar(int number) {
  getBar().push_back(number);
  return true;
}

// A.h
class A {
 public:
  A() {}

 private:
  static bool a1;
};

// A.cpp
bool A::a1 = Foo::insertIntoBar(22);

The initialization would being with the only static member variable bool A::a1. This would then call Foo::insertIntoBar(22). This would then call Foo::getBar() in which the initialization of the static std::vector<int> variable would occur before returning a reference to the initialized object.

If the static std::vector<int> bar were placed directly as a member variable of the Foo class, there would be a possibility, depending on the naming ordering of the source files, that bar would be initialized after insertIntoBar() were called, thereby crashing the program.

If multiple static member variables would call insertIntoBar() during their initialization, the order would not be dependent on the names of the source files, i.e., random, but the std::vector<int> would be guaranteed to be initialized before any values be inserted into it.

0
votes

If you really want to know the final order I would recommend you to create a class whose constructor logs the current timestamp and create several static instances of the class in each of your cpp files so that you could know the final order of initialization. Make sure to put some little time consuming operation in the constructor just so you don't get the same time stamp for each file.