First, note that it is unquestionable that all memory for mutable C/C++ objects has to be un-typed, un-specialized, usable for any mutable object. (I guess the memory for global const variables could hypothetically be typed, there is just no point with such hyper complication for such tiny corner case.) Unlike Java, C++ has no typed allocation of a dynamic object: new Class(args)
in Java is a typed object creation: creation an object of a well defined type, that might live in typed memory. On the other hand, the C++ expression new Class(args)
is just a thin typing wrapper around type-less memory allocation, equivalent with new (operator new(sizeof(Class)) Class(args)
: the object is created in "neutral memory". Changing that would mean changing a very big part of C++.
Forbidding the bit copy operation (whether done by memcpy
or the equivalent user defined byte by byte copy) on some type gives a lot freedom to the implementation for polymorphic classes (those with virtual functions), and other so called "virtual classes" (not a standard term), that is the classes that use the virtual
keyword.
The implementation of polymorphic classes could use a global associative map of addresses which associate the address of a polymorphic object and its virtual functions. I believe that was an option seriously considered during the design of the first iterations C++ language (or even "C with classes"). That map of polymorphic objects might use special CPU features and special associative memory (such features aren't exposed to the C++ user).
Of course we know that all practical implementations of virtual functions use vtables (a constant record describing all dynamic aspects of a class) and put a vptr (vtable pointer) in each polymorphic base class subobject, as that approach is extremely simple to implement (at least for the simplest cases) and very efficient. There is no global registry of polymorphic objects in any real world implementation except possibly in debug mode (I don't know such debug mode).
The C++ standard made the lack of global registry somewhat official by saying that you can skip the destructor call when you reuse the memory of an object, as long as you don't depend on the "side effects" of that destructor call. (I believe that means that the "side effects" are user created, that is the body of the destructor, not implementation created, as automatically done to the destructor by the implementation.)
Because in practice in all implementations, the compiler just uses vptr (pointer to vtables) hidden members, and these hidden members will be copied properly bymemcpy
; as if you did a plain member-wise copy of the C struct representing the polymorphic class (with all its hidden members). Bit-wise copies, or complete C struct members-wise copies (the complete C struct includes hidden members) will behave exactly as a constructor call (as done by placement new), so all you have to do it let the compiler think you might have called placement new. If you do a strongly external function call (a call to a function that cannot be inlined and whose implementation cannot be examined by the compiler, like a call to a function defined in a dynamically loaded code unit, or a system call), then the compiler will just assume that such constructors could have been called by the code it cannot examine. Thus the behavior of memcpy
here is defined not by the language standard, but by the compiler ABI (Application Binary Interface). The behavior of a strongly external function call is defined by the ABI, not just by the language standard. A call to a potentially inlinable function is defined by the language as its definition can be seen (either during compiler or during link time global optimization).
So in practice, given appropriate "compiler fences" (such as a call to an external function, or just asm("")
), you can memcpy
classes that only use virtual functions.
Of course, you have to be allowed by the language semantic to do such placement new when you do a memcpy
: you cannot willy-nilly redefine the dynamic type of an existing object and pretend you have not simply wrecked the old object. If you have a non const global, static, automatic, member subobject, array subobject, you can overwrite it and put another, unrelated object there; but if the dynamic type is different, you cannot pretend that it's still the same object or subobject:
struct A { virtual void f(); };
struct B : A { };
void test() {
A a;
if (sizeof(A) != sizeof(B)) return;
new (&a) B; // OK (assuming alignement is OK)
a.f(); // undefined
}
The change of polymorphic type of an existing object is simply not allowed: the new object has no relation with a
except for the region of memory: the continuous bytes starting at &a
. They have different types.
[The standard is strongly divided on whether *&a
can be used (in typical flat memory machines) or (A&)(char&)a
(in any case) to refer to the new object. Compiler writers are not divided: you should not do it. This a deep defect in C++, perhaps the deepest and most troubling.]
But you cannot in portable code perform bitwise copy of classes that use virtual inheritance, as some implementations implement those classes with pointers to the virtual base subobjects: these pointers that were properly initialized by the constructor of the most derived object would have their value copied by memcpy
(like a plain member wise copy of the C struct representing the class with all its hidden members) and wouldn't point the subobject of the derived object!
Other ABI use address offsets to locate these base subobjects; they depend only on the type of the most derived object, like final overriders and typeid
, and thus can be stored in the vtable. On these implementation, memcpy
will work as guaranteed by the ABI (with the above limitation on changing the type of an existing object).
In either case, it is entirely an object representation issue, that is, an ABI issue.
T
, if two pointers toT
point to distinctT
objectsobj1
andobj2
, where neitherobj1
norobj2
is a base-class subobject, if the underlying bytes making upobj1
are copied intoobj2
,obj2
shall subsequently hold the same value asobj1
". (emphasis mine) The subsequent sample usesstd::memcpy
. – Mooing Duck