How do fences actually work in c++

Question

I've been struggling with understanding how fences actually force code to synchronize.

for instance, say i have this code

bool x = false;
std::atomic<bool> y;
std::atomic<int> z;
void write_x_then_y()
{
    x = true;
    std::atomic_thread_fence(std::memory_order_release);
    y.store(true, std::memory_order_relaxed);
}
void read_y_then_x()
{
    while (!y.load(std::memory_order_relaxed));
    std::atomic_thread_fence(std::memory_order_acquire);
    if (x)
        ++z;
}
int main()
{
    x = false;
    y = false;
    z = 0;
    std::thread a(write_x_then_y);
    std::thread b(read_y_then_x);
    a.join();
    b.join();
    assert(z.load() != 0);
}

because the release fence is followed by an atomic store operation, and the acquire fence is preceded by an atomic load, everything synchronizes as it's supposed to and the assert won't fire

but if y was not an atomic variable like this

bool x;
bool y;
std::atomic<int> z;
void write_x_then_y()
{
    x = true;
    std::atomic_thread_fence(std::memory_order_release);
    y = true;
}
void read_y_then_x()
{
    while (!y);
    std::atomic_thread_fence(std::memory_order_acquire);
    if (x)
        ++z;
}

then, I hear, there might be a data race. But why is that? Why must release fences be followed by an atomic store, and acquire fences be preceded by an atomic load in order for the code to synchronize properly?

I would also appreciate it if anyone could provide an execution scenario in which a data race causes the assert to fire

Stop. Watch this video. Watch it again, and then accept that specifying memory ordering is a dangerous waste of your time: channel9.msdn.com/Shows/Going+Deep/… — Richard Hodges
The fences have nothing to do with the fact that unordered, concurrent access to non-atomic variables is UB according to the memory model. Fences afford synchronization on correct code, but they don't prevent the UB. — Kerrek SB
Honestly, the best answer to this question is probably, "because that's what the rules say, and systems are free to break code that doesn't follow the rules". — David Schwartz
very nice video @RichardHodges. it helped explain a lot of acquire release semantics that i wasn't aware of. I normally don't ever specify memory ordering (I just role with the default sequential consistent model :) but i was just curious as to why such weird implementation is necessary for the above code to not have a data race. — GamefanA
@user3769877 I really think that presentation should be mandatory viewing for anyone who is about to write multithreaded code. — Richard Hodges

Tsyvarev Tsyvarev · Accepted Answer · 2015-12-09T23:34:41

No real data race is a problem for your second snippet. This snippet would be OK ... if the compiler would literally generate machine code from the one which is written.

But the compiler is free to generate any machine code, which is equivalent to the original one in case of a single-threaded program.

E.g., compiler can note, that the y variable doesn't changes within while(!y) loop, so it can load this variable once to register and use only that register in the next iterations. So, if initially y=false, you will get an infinite loop.

Another optimization, which is possible, is just removing the while(!y) loop, as it doesn't contain accesses to volatile or atomic variables and doesn't use synchronization actions. (C++ Standard says that any correct program should eventually do one of the actions specified above, so the compiler may rely on that fact when optimizing the program).

And so on.

More generally, the C++ Standard specifies that concurrent access to any non-atomic variable lead to Undefined Behavior, which is like "Warranty is cleared". That is why you should use an atomic y variable.

From the other side, variable x doesn't need to be atomic, as accesses to it are not concurrent because of the memory fences.

How do fences actually work in c++

1 Answers