Yes, ISO C++ allows (but doesn't require) implementations to make this choice.
But also note that ISO C++ allows a compiler to emit code that crashes on purpose (e.g. with an illegal instruction) if the program encounters UB, e.g. as a way to help you find errors. (Or because it's a DeathStation 9000. Being strictly conforming is not sufficient for a C++ implementation to be useful for any real purpose). So ISO C++ would allow a compiler to make asm that crashed (for totally different reasons) even on similar code that read an uninitialized uint32_t
. Even though that's required to be a fixed-layout type with no trap representations.
It's an interesting question about how real implementations work, but remember that even if the answer was different, your code would still be unsafe because modern C++ is not a portable version of assembly language.
You're compiling for the x86-64 System V ABI, which specifies that a bool
as a function arg in a register is represented by the bit-patterns false=0
and true=1
in the low 8 bits of the register1. In memory, bool
is a 1-byte type that again must have an integer value of 0 or 1.
(An ABI is a set of implementation choices that compilers for the same platform agree on so they can make code that calls each other's functions, including type sizes, struct layout rules, and calling conventions.)
ISO C++ doesn't specify it, but this ABI decision is widespread because it makes bool->int conversion cheap (just zero-extension). I'm not aware of any ABIs that don't let the compiler assume 0 or 1 for bool
, for any architecture (not just x86). It allows optimizations like !mybool
with xor eax,1
to flip the low bit: Any possible code that can flip a bit/integer/bool between 0 and 1 in single CPU instruction. Or compiling a&&b
to a bitwise AND for bool
types. Some compilers do actually take advantage Boolean values as 8 bit in compilers. Are operations on them inefficient?.
In general, the as-if rule allows allows the compiler to take advantage of things that are true on the target platform being compiled for, because the end result will be executable code that implements the same externally-visible behaviour as the C++ source. (With all the restrictions that Undefined Behaviour places on what is actually "externally visible": not with a debugger, but from another thread in a well-formed / legal C++ program.)
The compiler is definitely allowed to take full advantage of an ABI guarantee in its code-gen, and make code like you found which optimizes strlen(whichString)
to
5U - boolValue
. (BTW, this optimization is kind of clever, but maybe shortsighted vs. branching and inlining memcpy
as stores of immediate data2.)
Or the compiler could have created a table of pointers and indexed it with the integer value of the bool
, again assuming it was a 0 or 1. (This possibility is what @Barmar's answer suggested.)
Your __attribute((noinline))
constructor with optimization enabled led to clang just loading a byte from the stack to use as uninitializedBool
. It made space for the object in main
with push rax
(which is smaller and for various reason about as efficient as sub rsp, 8
), so whatever garbage was in AL on entry to main
is the value it used for uninitializedBool
. This is why you actually got values that weren't just 0
.
5U - random garbage
can easily wrap to a large unsigned value, leading memcpy to go into unmapped memory. The destination is in static storage, not the stack, so you're not overwriting a return address or something.
Other implementations could make different choices, e.g. false=0
and true=any non-zero value
. Then clang probably wouldn't make code that crashes for this specific instance of UB. (But it would still be allowed to if it wanted to.) I don't know of any implementations that choose anything other what x86-64 does for bool
, but the C++ standard allows many things that nobody does or even would want to do on hardware that's anything like current CPUs.
ISO C++ leaves it unspecified what you'll find when you examine or modify the object representation of a bool
. (e.g. by memcpy
ing the bool
into unsigned char
, which you're allowed to do because char*
can alias anything. And unsigned char
is guaranteed to have no padding bits, so the C++ standard does formally let you hexdump object representations without any UB. Pointer-casting to copy the object representation is different from assigning char foo = my_bool
, of course, so booleanization to 0 or 1 wouldn't happen and you'd get the raw object representation.)
You've partially "hidden" the UB on this execution path from the compiler with noinline
. Even if it doesn't inline, though, interprocedural optimizations could still make a version of the function that depends on the definition of another function. (First, clang is making an executable, not a Unix shared library where symbol-interposition can happen. Second, the definition in inside the class{}
definition so all translation units must have the same definition. Like with the inline
keyword.)
So a compiler could emit just a ret
or ud2
(illegal instruction) as the definition for main
, because the path of execution starting at the top of main
unavoidably encounters Undefined Behaviour. (Which the compiler can see at compile time if it decided to follow the path through the non-inline constructor.)
Any program that encounters UB is totally undefined for its entire existence. But UB inside a function or if()
branch that never actually runs doesn't corrupt the rest of the program. In practice that means that compilers can decide to emit an illegal instruction, or a ret
, or not emit anything and fall into the next block / function, for the whole basic block that can be proven at compile time to contain or lead to UB.
GCC and Clang in practice do actually sometimes emit ud2
on UB, instead of even trying to generate code for paths of execution that make no sense. Or for cases like falling off the end of a non-void
function, gcc will sometimes omit a ret
instruction. If you were thinking that "my function will just return with whatever garbage is in RAX", you are sorely mistaken. Modern C++ compilers don't treat the language like a portable assembly language any more. Your program really has to be valid C++, without making assumptions about how a stand-alone non inlined version of your function might look in asm.
Another fun example is Why does unaligned access to mmap'ed memory sometimes segfault on AMD64?. x86 doesn't fault on unaligned integers, right? So why would a misaligned uint16_t*
be a problem? Because alignof(uint16_t) == 2
, and violating that assumption led to a segfault when auto-vectorizing with SSE2.
See also What Every C Programmer Should Know About Undefined Behavior #1/3, an article by a clang developer.
Key point: if the compiler noticed the UB at compile time, it could "break" (emit surprising asm) the path through your code that causes UB even if targeting an ABI where any bit-pattern is a valid object representation for bool
.
Expect total hostility toward many mistakes by the programmer, especially things modern compilers warn about. This is why you should use -Wall
and fix warnings. C++ is not a user-friendly language, and something in C++ can be unsafe even if it would be safe in asm on the target you're compiling for. (e.g. signed overflow is UB in C++ and compilers will assume it doesn't happen, even when compiling for 2's complement x86, unless you use clang/gcc -fwrapv
.)
Compile-time-visible UB is always dangerous, and it's really hard to be sure (with link-time optimization) that you've really hidden UB from the compiler and can thus reason about what kind of asm it will generate.
Not to be over-dramatic; often compilers do let you get away with some things and emit code like you're expecting even when something is UB. But maybe it will be a problem in the future if compiler devs implement some optimization that gains more info about value-ranges (e.g. that a variable is non-negative, maybe allowing it to optimize sign-extension to free zero-extension on x86-64). For example, in current gcc and clang, doing tmp = a+INT_MIN
doesn't optimize a<0
as always-false, only that tmp
is always negative. (Because INT_MIN
+ a=INT_MAX
is negative on this 2's complement target, and a
can't be any higher than that.)
So gcc/clang don't currently backtrack to derive range info for the inputs of a calculation, only on the results based on the assumption of no signed overflow: example on Godbolt. I don't know if this is optimization is intentionally "missed" in the name of user-friendliness or what.
Also note that implementations (aka compilers) are allowed to define behaviour that ISO C++ leaves undefined. For example, all compilers that support Intel's intrinsics (like _mm_add_ps(__m128, __m128)
for manual SIMD vectorization) must allow forming mis-aligned pointers, which is UB in C++ even if you don't dereference them. __m128i _mm_loadu_si128(const __m128i *)
does unaligned loads by taking a misaligned __m128i*
arg, not a void*
or char*
. Is `reinterpret_cast`ing between hardware vector pointer and the corresponding type an undefined behavior?
GNU C/C++ also defines the behaviour of left-shifting a negative signed number (even without -fwrapv
), separately from the normal signed-overflow UB rules. (This is UB in ISO C++, while right shifts of signed numbers are implementation-defined (logical vs. arithmetic); good quality implementations choose arithmetic on HW that has arithmetic right shifts, but ISO C++ doesn't specify). This is documented in the GCC manual's Integer section, along with defining implementation-defined behaviour that C standards require implementations to define one way or another.
There are definitely quality-of-implementation issues that compiler developers care about; they generally aren't trying to make compilers that are intentionally hostile, but taking advantage of all the UB potholes in C++ (except ones they choose to define) to optimize better can be nearly indistinguishable at times.
Footnote 1: The upper 56 bits can be garbage which the callee must ignore, as usual for types narrower than a register.
(Other ABIs do make different choices here. Some do require narrow integer types to be zero- or sign-extended to fill a register when passed to or returned from functions, like MIPS64 and PowerPC64. See the last section of this x86-64 answer which compares vs. those earlier ISAs.)
For example, a caller might have calculated a & 0x01010101
in RDI and used it for something else, before calling bool_func(a&1)
. The caller could optimize away the &1
because it already did that to the low byte as part of and edi, 0x01010101
, and it knows the callee is required to ignore the high bytes.
Or if a bool is passed as the 3rd arg, maybe a caller optimizing for code-size loads it with mov dl, [mem]
instead of movzx edx, [mem]
, saving 1 byte at the cost of a false dependency on the old value of RDX (or other partial-register effect, depending on CPU model). Or for the first arg, mov dil, byte [r10]
instead of movzx edi, byte [r10]
, because both require a REX prefix anyway.
This is why clang emits movzx eax, dil
in Serialize
, instead of sub eax, edi
. (For integer args, clang violates this ABI rule, instead depending on the undocumented behaviour of gcc and clang to zero- or sign-extend narrow integers to 32 bits. Is a sign or zero extension required when adding a 32bit offset to a pointer for the x86-64 ABI?
So I was interested to see that it doesn't do the same thing for bool
.)
Footnote 2: After branching, you'd just have a 4-byte mov
-immediate, or a 4-byte + 1-byte store. The length is implicit in the store widths + offsets.
OTOH, glibc memcpy will do two 4-byte loads/stores with an overlap that depends on length, so this really does end up making the whole thing free of conditional branches on the boolean. See the L(between_4_7):
block in glibc's memcpy/memmove. Or at least, go the same way for either boolean in memcpy's branching to select a chunk size.
If inlining, you could use 2x mov
-immediate + cmov
and a conditional offset, or you could leave the string data in memory.
Or if tuning for Intel Ice Lake (with the Fast Short REP MOV feature), an actual rep movsb
might be optimal. glibc memcpy
might start using rep movsb
for small sizes on CPUs with that feature, saving a lot of branching.
Tools for detecting UB and usage of uninitialized values
In gcc and clang, you can compile with -fsanitize=undefined
to add run-time instrumentation that will warn or error out on UB that happens at runtime. That won't catch unitialized variables, though. (Because it doesn't increase type sizes to make room for an "uninitialized" bit).
See https://developers.redhat.com/blog/2014/10/16/gcc-undefined-behavior-sanitizer-ubsan/
To find usage of uninitialized data, there's Address Sanitizer and Memory Sanitizer in clang/LLVM. https://github.com/google/sanitizers/wiki/MemorySanitizer shows examples of clang -fsanitize=memory -fPIE -pie
detecting uninitialized memory reads. It might work best if you compile without optimization, so all reads of variables end up actually loading from memory in the asm. They show it being used at -O2
in a case where the load wouldn't optimize away. I haven't tried it myself. (In some cases, e.g. not initializing an accumulator before summing an array, clang -O3 will emit code that sums into a vector register that it never initialized. So with optimization, you can have a case where there's no memory read associated with the UB. But -fsanitize=memory
changes the generated asm, and might result in a check for this.)
It will tolerate copying of uninitialized memory, and also simple logic and arithmetic operations with it. In general, MemorySanitizer silently tracks the spread of uninitialized data in memory, and reports a warning when a code branch is taken (or not taken) depending on an uninitialized value.
MemorySanitizer implements a subset of functionality found in Valgrind (Memcheck tool).
It should work for this case because the call to glibc memcpy
with a length
calculated from uninitialized memory will (inside the library) result in a branch based on length
. If it had inlined a fully branchless version that just used cmov
, indexing, and two stores, it might not have worked.
Valgrind's memcheck
will also look for this kind of problem, again not complaining if the program simply copies around uninitialized data. But it says it will detect when a "Conditional jump or move depends on uninitialised value(s)", to try to catch any externally-visible behaviour that depends on uninitialized data.
Perhaps the idea behind not flagging just a load is that structs can have padding, and copying the whole struct (including padding) with a wide vector load/store is not an error even if the individual members were only written one at a time. At the asm level, the information about what was padding and what is actually part of the value has been lost.
true
” is a rule about Boolean operations including “assignment to a bool” (which might implicitly invoke astatic_cast<bool>()
depending on specifics). It is however not a requirement about the internal representation of abool
chosen by the compiler. – Euro Micelli