It's because of the requirement for separate compilation and because templates are instantiation-style polymorphism.
Lets get a little closer to concrete for an explanation. Say I've got the following files:
- foo.h
- declares the interface of
class MyClass<T>
- foo.cpp
- defines the implementation of
class MyClass<T>
- bar.cpp
Separate compilation means I should be able to compile foo.cpp independently from bar.cpp. The compiler does all the hard work of analysis, optimization, and code generation on each compilation unit completely independently; we don't need to do whole-program analysis. It's only the linker that needs to handle the entire program at once, and the linker's job is substantially easier.
bar.cpp doesn't even need to exist when I compile foo.cpp, but I should still be able to link the foo.o I already had together with the bar.o I've only just produced, without needing to recompile foo.cpp. foo.cpp could even be compiled into a dynamic library, distributed somewhere else without foo.cpp, and linked with code they write years after I wrote foo.cpp.
"Instantiation-style polymorphism" means that the template MyClass<T>
isn't really a generic class that can be compiled to code that can work for any value of T
. That would add overhead such as boxing, needing to pass function pointers to allocators and constructors, etc. The intention of C++ templates is to avoid having to write nearly identical class MyClass_int
, class MyClass_float
, etc, but to still be able to end up with compiled code that is mostly as if we had written each version separately. So a template is literally a template; a class template is not a class, it's a recipe for creating a new class for each T
we encounter. A template cannot be compiled into code, only the result of instantiating the template can be compiled.
So when foo.cpp is compiled, the compiler can't see bar.cpp to know that MyClass<int>
is needed. It can see the template MyClass<T>
, but it can't emit code for that (it's a template, not a class). And when bar.cpp is compiled, the compiler can see that it needs to create a MyClass<int>
, but it can't see the template MyClass<T>
(only its interface in foo.h) so it can't create it.
If foo.cpp itself uses MyClass<int>
, then code for that will be generated while compiling foo.cpp, so when bar.o is linked to foo.o they can be hooked up and will work. We can use that fact to allow a finite set of template instantiations to be implemented in a .cpp file by writing a single template. But there's no way for bar.cpp to use the template as a template and instantiate it on whatever types it likes; it can only use pre-existing versions of the templated class that the author of foo.cpp thought to provide.
You might think that when compiling a template the compiler should "generate all versions", with the ones that are never used being filtered out during linking. Aside from the huge overhead and the extreme difficulties such an approach would face because "type modifier" features like pointers and arrays allow even just the built-in types to give rise to an infinite number of types, what happens when I now extend my program by adding:
- baz.cpp
- declares and implements
class BazPrivate
, and uses MyClass<BazPrivate>
There is no possible way that this could work unless we either
- Have to recompile foo.cpp every time we change any other file in the program, in case it added a new novel instantiation of
MyClass<T>
- Require that baz.cpp contains (possibly via header includes) the full template of
MyClass<T>
, so that the compiler can generate MyClass<BazPrivate>
during compilation of baz.cpp.
Nobody likes (1), because whole-program-analysis compilation systems take forever to compile , and because it makes it impossible to distribute compiled libraries without the source code. So we have (2) instead.