Why does this cppreference excerpt seem to wrongly suggest that atomics can protect critical sections?

Question

int main() {
    std::vector<int> foo;
    std::atomic<int> bar{0};
    std::mutex mx;
    auto job = [&] {
        int asdf = bar.load();
        // std::lock_guard lg(mx);
        foo.emplace_back(1);
        bar.store(foo.size());
    };
    std::thread t1(job);
    std::thread t2(job);
    t1.join();
    t2.join();
}

This obviously is not guaranteed to work, but works with a mutex. But how can that be explained in terms of the formal definitions of the standard?

Consider this excerpt from cppreference:

If an atomic store in thread A is tagged memory_order_release and an atomic load in thread B from the same variable is tagged memory_order_acquire [as is the case with default atomics], all memory writes (non-atomic and relaxed atomic) that happened-before the atomic store from the point of view of thread A, become visible side-effects in thread B. That is, once the atomic load is completed, thread B is guaranteed to see everything thread A wrote to memory.

Atomic loads and stores (with the default or with the specific acquire and release memory order specified) have the mentioned acquire-release semantics. (So does a mutex's lock and unlock.)

An interpretation of that wording could be that when Thread 2's load operation syncs with the store operation of Thread1, it is guaranteed to observe all (even non-atomic) writes that happened-before the store, such as the vector-modification, making this well-defined. But pretty much everyone would agree that this can lead to a segmentation fault and would surely do so if the job function ran its three lines in a loop.

What standard wording explains the obvious difference in capability between the two tools, given that this wording seems to imply that atomic would synchronize in a way.

I know when to use mutexes and atomics, and I know that the example doesn't work because no synchronization actually happens. My question is how the definition is to be interpreted so it doesn't contradict the way it works in reality.

How do you expect foo.emplace_back(1); would work in multiple threads without synchronization? — Slava
Running it instantly results in a segmentation fault which is in line with the mental model of c++ most programmers, including I, have. I admit I have trouble defining what my mental model actually is, even though I have used both atomics and mutexs for a long time with success. I simply never considered any formal definitions of the memory orders before. — JMC
@Slava I don't expect it to work. My problem is that after reading the formal definition of what atomics entail, i.e. acquire-release semantics, it seems to me like it should work according to the letter of the law. — JMC
bar itself is free of data races but that doesn't prevent a data race on foo. They are unrelated objects. — Blastfurnace
@JMC: You may find this interesting/educational : youtube.com/watch?v=ZQFzMfHIxng — engf-010

Nate Eldredge Nate Eldredge · Accepted Answer · 2020-12-30T00:57:06

The quoted passage means that when B loads the value that A stored, then by observing that the store happened, it can also be assured that everything that B did before the store has also happened and is visible.

But this doesn't tell you anything if the store has not in fact happened yet!

I would agree that if the load in your thread B returned 1, it could safely conclude that the other thread had finished its store and therefore had exited the critical section, and therefore B could safely use foo. But it is entirely possible that both loads return 0, if both threads do their loads before either one does its store. Your code doesn't even look at the value that was loaded, so both threads may enter the critical section together in that case.

The following code would be a safe, though inefficient, way to use an atomic to protect a critical section. It ensures that A will execute the critical section first, and B will wait until A has finished before proceeding. (Obviously if both threads wait for the other then you have a deadlock.)

int main() {
    std::vector<int> foo;
    std::atomic<int> bar{0};
    std::mutex mx;
    auto jobA = [&] {
        foo.emplace_back(1);
        bar.store(foo.size());
    };
    auto jobB = [&] {
        while (bar.load() == 0) /* spin */ ;
        foo.emplace_back(1);
    };

    std::thread t1(jobA);
    std::thread t2(jobB);
    t1.join();
    t2.join();
}

Why does this cppreference excerpt seem to wrongly suggest that atomics can protect critical sections?

3 Answers