2
votes

Case 1: multiple definitions of a function

module1.cpp:

void f(){}

main.cpp:

void f(){} // error LNK2005: "void __cdecl f(void)" (?func@@YAXXZ) already defined in module1.obj
int main(){} 

Case 2: multiple definitions of a class

module1.cpp:

class C{};

main.cpp:

class C{}; // OK
int main(){} 

In the Case 1, as expected, (Microsoft) linker comes across two definitions of the same function and issues an error. In Case 2, it allows two definitions of the same class.

Question 1: Why linker doesn't complain when having multiple definitions of the same class? Is it related to the fact that function name is just the name of the address where its instructions start and class name is the name of a new type?

Furthermore, linker won't complain even if we use different definitions of the class (I added functions that call class constructors so they appear in the symbol table):

module1.cpp:

class MyClass
{
    int n;
public: 
    MyClass() : n(123){}
};

void func()
{
   MyClass c;
}

main.cpp:

class MyClass
{
   float n;
public: 
   MyClass() : n(3.14f){}
};

int main()
{
   MyClass c;
} 

I set the compiler option so it generates COD file along OBJ files. I can see that both constructors appear under the same mangled name (??0MyClass@@QAE@XZ), each in its own unit (COD file). It is expected that if some symbol is referenced in a module, linker will use its definition from the same module, if it exist. If not, it will be using symbol definition from the module where it is defined. And this can be dangerous as it seems that linker picks symbol from the first object file it comes across:

module1.h:

#ifndef MODULE1_H_
#define MODULE1_H_

void func1();

#endif

module1.cpp:

#include <iostream>
#include "module1.h"

class MyClass
{
    int myValue;

public:
    MyClass() : myValue(123)
    {
        std::cout << "MyClass::MyClass() [module1]" << std::endl;
    }   

    void foo()
    {       
        std::cout << "MyClass::foo() [module1]: n = " << myValue << std::endl;
    }
};

void func1()
{
    MyClass c;
    c.foo();
}

module2.cpp:

#include <iostream>

class MyClass
{   
public:
    MyClass()
    {
        std::cout << "MyClass::MyClass() [module2]" << std::endl;
    }   
};

// it is necessary that module contains at least one function that creates MyClass object
void test2()
{
    MyClass c;
}

main.cpp:

#include "module1.h"

int main()
{
    func1();
}

If object files are listed in this order when passing to the linker:

module2.obj module1.obj main.obj

linker will pick constructor MyClass::MyClass from the first obj file but MyClass::foo from the second one so the output is unexpected (wrong):

MyClass::MyClass() [module2]
MyClass::foo() [module1]: n = 1

If object files are listed in this order when passing to the linker:

module1.obj module2.obj main.obj

linker will pick both MyClass members from the first obj file:

MyClass::MyClass() [module1]
MyClass::foo() [module1]: n = 123

Question 2: Why are linkers designed in the way that they allow multiple class definitions which can lead to errors described above? Isn't it wrong that linking process depends on the order of object files?

It seems that linker picks the first symbol definition it comes across when scanning object files and then silently discards all subsequent definition duplicates.
Question 3: Is this how linker builds its symbol lookup table?

2
This questions sounds very much like "what will happen if I invoke undefined behavior because of ODR violation and why".user784668
Actually I do not know how have you got such an output... I tried your code and everything was fine. I mean that class from module 1 was instantiated and its method was called. Can you explain me any reason, why you should not be allowed to define classes with the same name inside cpp file?besworland
@Fanael May be he's just curios how the compiler works at the low level.hamstergene
@hamstergene: but even in this case, it doesn't make much sense to know what happens when the behavior is undefined.user784668
@besworland In one of my test projects I have classes with the same name in different cpp files but then started experiencing unexpected behaviour (e.g. one class has boost::thread as a member and move operation would always fail with access violation...but when I excluded from the project cpp file with the another definition of the class with the same name, the problem was solved...). This is why I started digging into this.Bojan Komazec

2 Answers

4
votes

Regarding your Question 1: Multiple definitions of classes and inline functions are allowed, as long as you don't violate the One Definition Rule (ODR).

When you define a function within a class, it is implicitly inline. You invoked undefined behaviour by violating the ODR with the constructors of MyClass.

The rationale of this behaviour is: When you have an inline function in a class, it is visible in a number of compilation units, with no compilation unit obviously the "preferred" compilation unit. Your toolchain can, however, rely on the ODR and assume that all inlined methods have the same semantics. Therefore, the linker can pick any of the inlined function definitions in linking, since they are all the same.

The solution to the problem is simple: Don't violate the ODR.

3
votes

Q1: Because the function definitions generate symbols for linkage and the class definitions do not.

Note that this is not true in general case, because some functions may not participate in linkage (e.g. global with static keyword), while some classes, indirectly, may (e.g. with virtual methods or static variables).

Q2, Q3: Linker only works with symbol names; it does not know whether a symbol is a variable or a function or something else. It takes a set of modules M1, M2, M3, ..., Mn that were generated by compilers. That may be different compilers that do not know about each other. Each module may contain symbols e.g. Mi.A, Mi.B, Mi.C, Mi.foo and may refer to external symbols e.g. ??.E, ??.F, ??.G, ??.printf. (The linker also takes libraries which are just archives of modules).

The linker's job is to resolve each external symbol reference by finding the module that contains symbol with that name.

For example, if M1 contains main and refers to ??.printf and ??.foo, and M2 contains foo, the linker will replace all references of ??.foo with address of M2.foo, and all references of ??.printf with address of standard_c_library.printf.

That's basically all that a linker does — merges modules into a single binary, replaces each reference to a symbol with its final memory address, throws away unused symbols.