25
votes

Sample code:

struct S { int x; };

int func()
{
     S s{2};
     return (int &)s;    // Equivalent to *reinterpret_cast<int *>(&s)
}

I believe this is common and considered acceptable. The standard does guarantee that there is no initial padding in the struct. However this case is not listed in the strict aliasing rule (C++17 [basic.lval]/11):

If a program attempts to access the stored value of an object through a glvalue of other than one of the following types the behavior is undefined:

  • (11.1) the dynamic type of the object,
  • (11.2) a cv-qualified version of the dynamic type of the object,
  • (11.3) a type similar (as defined in 7.5) to the dynamic type of the object,
  • (11.4) a type that is the signed or unsigned type corresponding to the dynamic type of the object,
  • (11.5) a type that is the signed or unsigned type corresponding to a cv-qualified version of the dynamic type of the object,
  • (11.6) an aggregate or union type that includes one of the aforementioned types among its elements or non-static data members (including, recursively, an element or non-static data member of a subaggregate or contained union),
  • (11.7) a type that is a (possibly cv-qualified) base class type of the dynamic type of the object,
  • (11.8) a char, unsigned char, or std::byte type.

It seems clear that the object s is having its stored value accessed.

The types listed in the bullet points are the type of the glvalue doing the access, not the type of the object being accessed. In this code the glvalue type is int which is not an aggregate or union type, ruling out 11.6.

My question is: Is this code correct, and if so, under which of the above bullet points is it allowed?

3
I'm mainly familiar with the C Standard rather than the C++ Standard, but the authors of the former didn't think it necessary to specify any situations where an lvalue of an aggregate member type can actually be used to access the aggregate. Even something like myStruct.member=23; invokes UB unless member has a character type, but a compiler would have to be rather obtuse not to recognize such usage. A compiler would likewise have to be obtuse not to recognize situations where a pointer which is freshly converted to a member type is used to access that member. The Standard, however...supercat
...does not mandate such behavior, but relies upon compiler writers recognizing that quality implementations should behave usefully even in cases not mandated by the Standard. Unfortunately, such reliance turns out to have been misplaced.supercat

3 Answers

18
votes

The behaviour of the cast comes down to [expr.static.cast]/13;

A prvalue of type “pointer to cv1 void” can be converted to a prvalue of type “pointer to cv2 T”, where T is an object type and cv2 is the same cv-qualification as, or greater cv-qualification than, cv1. If the original pointer value represents the address A of a byte in memory and A does not satisfy the alignment requirement of T , then the resulting pointer value is unspecified. Otherwise, if the original pointer value points to an object a, and there is an object b of type T (ignoring cv-qualification) that is pointer-interconvertible with a, the result is a pointer to b. Otherwise, the pointer value is unchanged by the conversion.

The definition of pointer-interconvertible is:

Two objects a and b are pointer-interconvertible if:

  • they are the same object, or
  • one is a union object and the other is a non-static data member of that object, or
  • one is a standard-layout class object and the other is the first non-static data member of that object, or, if the object has no non-static data members, the first base class subobject of that object, or
  • there exists an object c such that a and c are pointer-interconvertible, and c and b are pointer-interconvertible.

So in the original code, s and s.x are pointer-interconvertible and it follows that (int &)s actually designates s.x.

So, in the strict aliasing rule, the object whose stored value is being accessed is s.x and not s and so there is no problem, the code is correct.

6
votes

I think it's in expr.reinterpret.cast#11

A glvalue expression of type T1, designating an object x, can be cast to the type “reference to T2” if an expression of type “pointer to T1” can be explicitly converted to the type “pointer to T2” using a reinterpret_­cast. The result is that of *reinterpret_­cast<T2 *>(p) where p is a pointer to x of type “pointer to T1”. No temporary is created, no copy is made, and no constructors or conversion functions are called [1].

[1] This is sometimes referred to as a type pun when the result refers to the same object as the source glvalue

Supporting @M.M's answer about pointer-incovertible:

from cppreference:

Assuming that alignment requirements are met, a reinterpret_cast does not change the value of a pointer outside of a few limited cases dealing with pointer-interconvertible objects:

struct S { int a; } s;


int* p = reinterpret_cast<int*>(&s); // value of p is "pointer to s.a" because s.a
                                     // and s are pointer-interconvertible
*p = 2; // s.a is also 2

versus

struct S { int a; };

S s{2};
int i = (int &)s;    // Equivalent to *reinterpret_cast<int *>(&s)
                     // i doesn't change S.a;
1
votes

The cited rule is derived from a similar rule in C89 which would be nonsensical as written unless one stretches the meaning of the word "by", or recognizes what "Undefined Behavior" meant when C89 was written. Given something like struct S {unsigned dat[10];}s;, the statement s.dat[1]++; would clearly modify the stored value of s, but the only lvalue of type struct S in that expression is used solely for the purpose of producing a value of type unsigned*. The only lvalue which is used to modify any object is of type int.

As I see it, there are two related ways of resolving this issue: (1) recognizing that the authors of the Standard wanted to allow cases where an lvalue of one type was visibly derived from one of another type, but didn't want to get hung up on details of what forms of visible derivation must be accounted for, especially since the range of cases compilers would need to recognize would vary considerably based upon the styles of optimization they performed and the tasks for which they were being used; (2) recognizing that the authors of the Standard had no reason to think it should matter whether the Standard actually required that a particular construct be processed usefully, if it would be have been clear to everyone that there was reason to do otherwise.

I don't think there has consensus among the Committee members over whether a compiler given something like:

struct foo {int ct; int *dat;} it;
void test(void)
{
  for (int i=0; i < it.ct; i++)
    it.dat[i] = 0;
}

should be required to ensure that e.g. after it.ct = 1234; it.dat = &it.ct;, a call to test(); would zero it.ct and have no other effect. Parts of the Rationale would suggest that at least some committee members would have expected so, but the omission of any rule that would allow for an object of structure type to be accessed using an arbitrary lvalue of member type suggests otherwise. The C Standard has never really resolved this issue, and the C++ Standard cleans things up somewhat but doesn't really solve it either.