11
votes

Following a question asked here earlier today and multitudes of similary themed questions, I'm here to ask about this problem from stadard's viewpoint.

struct Base
{
  int member;
};

struct Derived : Base
{
  int another_member;
};

int main()
{
  Base* p = new Derived[10]; // (1)
  p[1].member = 42; // (2)
  delete[] p; // (3)
}

According to standard (1) is well-formed, because Dervied* (which is the result of new-expression) can be implicitly converted to Base* (C++11 draft, §4.10/3):

A prvalue of type “pointer to cv D”, where D is a class type, can be converted to a prvalue of type “pointer to cv B”, where B is a base class (Clause 10) of D. If B is an inaccessible (Clause 11) or ambiguous (10.2) base class of D, a program that necessitates this conversion is ill-formed. The result of the conversion is a pointer to the base class subobject of the derived class object. The null pointer value is converted to the null pointer value of the destination type.

(3) leads to undefined behaviour because of §5.3.5/3:

In the first alternative (delete object), if the static type of the object to be deleted is different from its dynamic type, the static type shall be a base class of the dynamic type of the object to be deleted and the static type shall have a virtual destructor or the behavior is undefined. In the second alternative (delete array) if the dynamic type of the object to be deleted differs from its static type, the behavior is undefined.

Is (2) legal according to standard or does it lead to ill-formed program or undefined behaviour?

edit: Better wording

4
Why are we assuming (2) is ill-formed?Kerrek SB
(2) is very ill-formed because it uses sizeof(Base) to compute the distance between p[0] and p[1].Bo Persson
It's not ill-formed, it's just UB because p doesn't point to an element of an array object (the condition for pointer arithmetic to work), it points to a base class sub-object of an array element so the array access is invalid.CB Bailey
@Kerrek SB: Perhaps the last question should have been worded little bit differently, but since major implementations (tested with gcc, clang and MSVC) don't get it right, I assumed (2) is ill-formed. I spent last two hours searching something like what Bo Persson said, i.e. (p + n) uses static type of p to compute offset, but I got the feeling that paragraph concerning operator+ doesn't imply this.Vitus
@Charles Bailey: Oh, that indeed makes sense. Please, do post it as answer.Vitus

4 Answers

4
votes

If you look at the expression p[1], p is a Base* (Base is a completely-defined type) and 1 is an int, so according to ISO/IEC 14882:2003 5.2.1 [expr.sub] this expression is valid and identical to *((p)+(1)).

From 5.7 [expr.add] / 5, when an integer is added to a pointer, the result is only well defined when the pointer points to an element of an array object and the result of the pointer arithmetic also points the an element of that array object or one past the end of the array. p, however, does not point to an element of an array object, it points at the base class sub-object of a Derived object. It is the Derived object that is an array member, not the Base sub-object.

Note that under 5.7 / 4, for the purposes of the addition operator, the Base sub-object can be treated as an array of size one, so technically you can form the address p + 1, but as a "one past the last element" pointer, it doesn't point at a Base object and attempting to read from or write to it will cause undefined behavior.

4
votes

(3) leads to undefined behaviour, but it is not ill-formed strictly speaking. Ill-formed means that a C++ program is not constructed according to the syntax rules, diagnosable semantic rules, and the One Definition Rule.

Same for (2), it is well-formed, but it does not do what you have probably expected. According to §8.3.4/6:

Except where it has been declared for a class (13.5.5), the subscript operator [] is interpreted in such a way that E1[E2] is identical to *((E1)+(E2)). Because of the conversion rules that apply to +, if E1 is an array and E2 an integer, then E1[E2] refers to the E2-th member of E1. Therefore, despite its asymmetric appearance, subscripting is a commutative operation.

So in (2) you will get the address which is the result of p+sizeof(Base)*1 when you probably wanted to get the address p+sizeof(Derived)*1.

1
votes

The standard doesn't disallow (2), but it's dangerous nevertheless.

The problem is that doing p[1] means adding sizeof(Base) to the base address p, and using the data at that memory location as an instance of Base. But chances are very high that sizeof(Base) is smaller than sizeof(Derived), so you'll be interpreting a block of memory starting in the middle of a Derived object, as a Base object.

More information in C++ FAQ Lite 21.4.

0
votes
p[1].member = 42; 

is well formed. Static type for p is Derived and dynamic type is Base. p[1] is equivalent to *(p+1) which seems a valid and is a pointer to first element of dynamic type Base in array.

However, *(p+1) in fact refers to an array member of type Derived. Code p[1].member = 42; shows you think you are referring to an array member with type Base.