What is the significance of special language in standard for lvalue-to-rvalue conversions for unsigned character types of indeterminate value

Question

In the C++14 standard (n3797), the section on lvalue to rvalue conversions reads as follows (emphasis mine):

4.1 Lvalue-to-rvalue-conversion [conv.lval]

A glvalue (3.10) of a non-function, non-array type T can be converted to a prvalue. If T is an incomplete type, a program that necessitates this conversion is ill-formed. If T is a non-class type, the type of the prvalue is the cv-unqualified version of T. Otherwise the type of the prvalue is T.

When an lvalue-to-rvalue conversion occurs in an unevaluated operand or a subexpression thereof (Clause 5) the value contained in the referenced object is not accessed. In all other cases, the result of the conversion is determined according to the following rules:

If T is a (possibly cv-qualified) std::nullptr_t then the result is a null pointer constant.

Otherwise, if T has class type, the conversion copy-initializes a temporary of type T from the glvalue and the result of the conversion is a prvalue for the temporary.

Otherwise, if the object to which the glvalue refers contains an invalid pointer value, the behavior is implementation-defined.

Otherwise, if T is a (possibly cv-qualified) unsigned character type, and the object to which the glvalue refers contains an indeterminate value, and that object does not have automatic storage duration or the glvalue was the operand of a unary & operator or it was bound to a reference, the result is an unspecified value.

Otherwise, if the object to which the glvalue refers has an indeterminate value, the behavior is undefined.

Otherwise, the object indicated by the glvalue is the prvalue result.

[Note: See also 3.10]

What's the significance of this paragraph (in bold)?

If this paragraph were not here, then the situations in which it applies would lead to undefined behavior. Normally, I would expect that accessing an unsigned char value while it has an indeterminate value leads to undefined behavior. But, with this paragraph it means that

If I'm not actually accessing the character value, i.e. I'm immediately passing it to & or binding it to a reference, or
If the unsigned char does not have automatic storage duration,

then the conversion yields an unspecified value, and not undefined behavior.

Am I correct to conclude that this program:

#include <new>
#include <iostream>

// using T = int;
using T = unsigned char;

int main() {
  T * array = new T[500];
  for (int i = 0; i < 500; ++i) {
    std::cout << static_cast<int>(array[i]) << std::endl;
  }
  delete[] array;
}

is well-defined by the standard, and must output a sequence of 500 unspecified ints, while the same program where T = int, would have undefined behavior?

IIUC, one of the reasons to make it UB to read things with indeterminate values, is to allow aggressive dead store elimination by the optimizer. So, this paragraph may mean that a conforming compiler can't do as much optimization when working with unsigned char or arrays of unsigned char.

Assuming I understand correctly, what is the rationale for this rule? When is it useful to be able to read unsigned char that have indeterminate values, and get unspecified results instead of UB? I have this feeling that if they put this much effort into crafting this part of the rule, they had some motivation to help certain code examples that they cared about, or to be consistent with some other part of the standard, or simplify some other issue. But I have no idea what that might be.

I think it has more to do with the fact that on some architectures, interpreting certain raw memory values ("trap values") as a signed type will cause an interrupt, but there are no (or very few) known architectures where this is true for an unsigned type. So this clause is there to allow your program to crash and burn on those architectures when you happen to interpret one of those trap values as signed. — cdhowie
That wording is no longer there in the more recent draft, N3936, updated per N3914 (scroll to the bottom or search for 1787). — Igor Tandetnik
N3936 is a pre-C++14 draft. N4140 is C++14 (except for cover page) — M.M
This feature would be useful for printing out representations of variables with possibly invalid or indeterminate values (by aliasing them as a series of unsigned char) — M.M
The stuff about automatic variables, address taken etc. is the "Itanium clause", see here for discussion of that architecture — M.M

supercat supercat · Accepted Answer · 2017-09-06T16:36:07

In many situations, code will write some parts of a PODS or array without writing everything, and then use functions like memcpy or fwrite to copy or write the entire thing without regard for which parts had assigned values and which did not. Although it is not terribly common for C++ code to use byte-based operations to copy or write out the contents of aggregates, the ability to do so is a fundamental part of the language. Requiring that a program write definite values to all portions of an object, including those nothing will ever "care" about, would needlessly impair efficiency.

What is the significance of special language in standard for lvalue-to-rvalue conversions for unsigned character types of indeterminate value

1 Answers