Does `volatile` permits type punning with unions?

Question

We all know that type punning like this

union U {float a; int b;};

U u;
std::memset(u, 0, sizeof u);
u.a = 1.0f;
std::cout << u.b;

is undefined behavior in C++.

It is undefined because after u.a = 1.0f; assignment .a becomes an active field and .b becomes inactive field, and it's undefined behavior to read from an inactive field. We all know this.

Now, consider following code

union U {float a; int b;};

U u;
std::memset(u, 0, sizeof u);
u.a = 1.0f;

char *ptr = new char[std::max(sizeof (int),sizeof (float))];
std::memcpy(ptr, &u.a, sizeof (float));
std::memcpy(&u.b, ptr, sizeof (int));

std::cout << u.b;

And now it becomes well-defined, because this kind of type punning is allowed. Also, as you see, u memory remains same after memcpy() calls.

Now let's add threads and the volatile keyword.

union U {float a; int b;};

volatile U u;
std::memset(u, 0, sizeof u);
u.a = 1.0f;

std::thread th([&]
{
    char *ptr = new char[sizeof u];
    std::memcpy(ptr, &u.a, sizeof u);
    std::memcpy(&u.b, ptr, sizeof u);
});
th.join();

std::cout << u.b;

The logic remains same, but we just have second thread. Because of the volatile keyword code remains well-defined.

In real code this second thread can be implemented through any crappy threading library and compiler can be unaware of that second thread. But because of the volatile keyword it's still well-defined.

But what if there is no other threads?

union U {float a; int b;};

volatile U u;
std::memset(u, 0, sizeof u);
u.a = 1.0f;
std::cout << u.b;

There is no other threads. But compiler does not know that there is no other threads!

From compiler point of view, nothing changed! And if third example was well-defined, last one must be well-defined too!

And we don't need that second thread because it does not change u memory anyway.

If volatile is used, compiler assumes that u can be modified silently at any point. At such modification any field can become active.

And so, compiler can never track what field of volatile union is active. It can't assume that a field remains active after it was assigned to (and that other fields remain inactive), even if nothing really modifies that union.

And so, in last two examples compiler shall give me exact bit representation of 1.0f converted to int.

The questions are: Is my reasoning correct? Are 3rd and 4th examples really well-defiend? What the standard says about it?

volatile has nothing to do with threads. See Why does volatile exist? — Bo Persson
First of all, in your forth example you read a variable without synchronization, thus the compiler can assume no other thread writes to it. volatile only means that every operation on the variable is a observable side effect; that is unrelated to multithreaded execution. So that example is definitely wrong. And second: Why are you doing such things in the first place? — Baum mit Augen
@BaummitAugen 1. I do such thing because it looks like a hack for easy type punning in C++. 2. Do you mean that when compiler sees th.join() it assumes that there is a possibility that the voltaile variable was modified? But what if I use any nonstandard threading library, for example SDL threads? Compiler has no knowledge that SDL_WaitThread() joins a thread. 3. volatile only means that every operation on the variable is a observable side effect Can you explain this? I don't understand what you mean. — HolyBlackCat
@HolyBlackCat It does not need to know that because it knows that you passed the variable to some function by reference, or its address, else the new thread could not modify it. So it can deduce that it might have changed from that. — Baum mit Augen
Besides, the second example seems wrong too as you are reading uninitialized bytes if sizeof(float) < sizeof(int). — Baum mit Augen

Eric Lippert Eric Lippert · Accepted Answer · 2015-10-17T17:37:16

In real code this second thread can be implemented through any crappy threading library and compiler can be unaware of that second thread. But because of the volatile keyword it's still well-defined.

That statement is false, and so the rest of the logic upon which you base your conclusion is unsound.

Suppose you have code like this:

int* currentBuf = bufferStart;
while(currentBuf < bufferEnd)
{
    *currentBuf = foobar;    
    currentBuf++;
}

If foobar is not volatile then a compiler is permitted to reason as follows: "I know that foobar is never aliased by currentBuf and therefore does not change within the loop, therefore I may optimize the code as"

int* currentBuf = bufferStart;
int temp = foobar;
while(currentBuf < bufferEnd)
{
    *currentBuf = temp;    
    currentBuf++;
}

If foobar is volatile then this and many other code generation optimizations are disabled. Notice I said code generation. The CPU is entirely within its rights however to move reads and writes around to its heart's content, provided that the memory model of the CPU is not violated.

In particular, the compiler is not required to force the CPU to go back to main memory on every read and write of foobar. All it is required to do is to eschew certain optimizations. (This is not strictly true; the compiler is also obliged to ensure that certain properties involving long jumps are preserved, and a few other minor details that have nothing to do with threading.) If there are two threads, and each is on a different processor, and each processor has a different cache, volatile introduces no requirement that the caches be made coherent if they both contain a copy of the memory for foobar.

Some compilers may choose to implement those semantics for your convenience, but they are not required to do so; consult your compiler documentation.

I note that C# and Java do require acquire and release semantics on volatiles, but those requirements can be surprisingly weak. In particular, the x86 will not reorder two volatile writes or two volatile reads, but is permitted to reorder a volatile read of one variable before a volatile write of another, and in fact the x86 processor can do so in rare situations. (See http://blog.coverity.com/2014/03/26/reordering-optimizations/ for a puzzle written in C# that illustrates how low-lock code can be wrong even if everything is volatile and has acquire-release semantics.)

The moral is: even if your compiler is helpful and does impose additional semantics on volatile variables like C# or Java, it still may be the case that there is no consistently observed sequence of reads and writes across all threads; many memory models do not impose this requirement. This can then cause weird runtime behaviour. Again, consult your compiler documentation if you want to know what volatile means for you.

Does `volatile` permits type punning with unions?

2 Answers