4
votes

According to this stackoverflow answer about C++11/14 strict alias rules:

If a program attempts to access the stored value of an object through a glvalue of other than one of the following types the behavior is undefined:

  • the dynamic type of the object,

  • a cv-qualified version of the dynamic type of the object,

  • a type similar (as defined in 4.4) to the dynamic type of the object,
  • a type that is the signed or unsigned type corresponding to the dynamic type of the object,
  • a type that is the signed or unsigned type corresponding to a cv-qualified version of the dynamic type of the object,
  • an aggregate or union type that includes one of the aforementioned types among its elements or non-static data members (including, recursively, an element or non-static data member of a subaggregate or contained union),
  • a type that is a (possibly cv-qualified) base class type of the dynamic type of the object,
  • a char or unsigned char type.

can we access the storage of other type using

(1) char *

(2) char(&)[N]

(3) std::array<char, N> &

without depending on undefined behavior?

constexpr uint64_t lil_endian = 0x65'6e'64'69'61'6e; 
    // a.k.a. Clockwise-Rotated Endian which allocates like
    // char[8] = { n,a,i,d,n,e,\0,\0 }

const auto& arr =   // std::array<char,8> &
    reinterpret_cast<const std::array<char,8> &> (lil_endian);

const auto& carr =  // char(&)[8]>
    reinterpret_cast<const char(&)[8]>           (lil_endian);

const auto* p =     // char *
    reinterpret_cast<const char *>(std::addressof(lil_endian));

int main()
{
    const auto str1  = std::string(arr.crbegin()+2, arr.crend() );

    const auto str2  = std::string(std::crbegin(carr)+2, std::crend(carr) );

    const auto sv3r  = std::string_view(p, 8);
    const auto str3  = std::string(sv3r.crbegin()+2, sv3r.crend() );

    auto lam = [](const auto& str) {
        std::cout << str << '\n'
                  << str.size() << '\n' << '\n' << std::hex;
        for (const auto ch : str) {
            std::cout << ch << " : " << static_cast<uint32_t>(ch) << '\n';
        }
        std::cout << '\n' << '\n' << std::dec;
    };

    lam(str1);
    lam(str2);
    lam(str3);
}

all lambda invocations produce:

endian
6

e : 65
n : 6e
d : 64
i : 69
a : 61
n : 6e

godbolt.org/g/cdDTAM (enable -fstrict-aliasing -Wstrict-aliasing=2 )

wandbox.org/permlink/pGvPCzNJURGfEki7

2

2 Answers

3
votes

The char(&)[N] case and std::array<char, N> case both result in undefined behavior. The reason has already been block-quoted by you. Note neither char(&)[N] nor std::array<char, N> is the same type as char.

I am not sure of the char case, because the current standard does not explicitly say that an object can be viewed as an array of narrow characters (see here for further discussion).

Anyway, if you want to access the underlying bytes of an object, use std::memcpy, as the standards explicitly says in [basic.types]/2:

For any object (other than a base-class subobject) of trivially copyable type T, whether or not the object holds a valid value of type T, the underlying bytes ([intro.memory]) making up the object can be copied into an array of char, unsigned char, or std​::​byte ([cstddef.syn]). If the content of that array is copied back into the object, the object shall subsequently hold its original value. [ Example:

#define N sizeof(T)
char buf[N];
T obj;                          // obj initialized to its original value
std::memcpy(buf, &obj, N);      // between these two calls to std​::​memcpy, obj might be modified
std::memcpy(&obj, buf, N);      // at this point, each subobject of obj of scalar type holds its original value

— end example ]

2
votes

The strict aliasing rule is in fact very simple: Two objects with overlapping lifetime cannot have overlapping storage region if one is not a suboject of the other.(*)

Nevertheless, it is allowed to read the memory representation of an object. The memory representation of an object is a sequence of unsigned char [basic.types]/4:

The object representation of an object of type T is the sequence of N unsigned char objects taken up by the object of type T, where N equals sizeof(T). The value representation of an object is the set of bits that hold the value of type T.

Accordingly in your example:

  • lam(str1) is UB (Undefined Behavior);
  • lam(str2) is UB (an array and its first element are not pointer interconvertible);
  • lam(str3) is not stated as UB in the standard, if you replace char by unsigned char one could argue that you are reading the object representation. (it is not defined either, but it should work on all compilers)

So using the third case and changing the declaration of p to const unsigned char* should always produce the expected result. For the other 2 cases, it can work with this simple example, but may break if the code is more complicated or on newer compiler version.


(*) There are two exception to this rule: one for unions' members with common initialization sequence; and one for array of unsigned char or std::byte that provides storage for an other object.