4
votes
int main() {
    std::vector<int> foo;
    std::atomic<int> bar{0};
    std::mutex mx;
    auto job = [&] {
        int asdf = bar.load();
        // std::lock_guard lg(mx);
        foo.emplace_back(1);
        bar.store(foo.size());
    };
    std::thread t1(job);
    std::thread t2(job);
    t1.join();
    t2.join();
}

This obviously is not guaranteed to work, but works with a mutex. But how can that be explained in terms of the formal definitions of the standard?

Consider this excerpt from cppreference:

If an atomic store in thread A is tagged memory_order_release and an atomic load in thread B from the same variable is tagged memory_order_acquire [as is the case with default atomics], all memory writes (non-atomic and relaxed atomic) that happened-before the atomic store from the point of view of thread A, become visible side-effects in thread B. That is, once the atomic load is completed, thread B is guaranteed to see everything thread A wrote to memory.

Atomic loads and stores (with the default or with the specific acquire and release memory order specified) have the mentioned acquire-release semantics. (So does a mutex's lock and unlock.)

An interpretation of that wording could be that when Thread 2's load operation syncs with the store operation of Thread1, it is guaranteed to observe all (even non-atomic) writes that happened-before the store, such as the vector-modification, making this well-defined. But pretty much everyone would agree that this can lead to a segmentation fault and would surely do so if the job function ran its three lines in a loop.

What standard wording explains the obvious difference in capability between the two tools, given that this wording seems to imply that atomic would synchronize in a way.

I know when to use mutexes and atomics, and I know that the example doesn't work because no synchronization actually happens. My question is how the definition is to be interpreted so it doesn't contradict the way it works in reality.

3
How do you expect foo.emplace_back(1); would work in multiple threads without synchronization?Slava
Running it instantly results in a segmentation fault which is in line with the mental model of c++ most programmers, including I, have. I admit I have trouble defining what my mental model actually is, even though I have used both atomics and mutexs for a long time with success. I simply never considered any formal definitions of the memory orders before.JMC
@Slava I don't expect it to work. My problem is that after reading the formal definition of what atomics entail, i.e. acquire-release semantics, it seems to me like it should work according to the letter of the law.JMC
bar itself is free of data races but that doesn't prevent a data race on foo. They are unrelated objects.Blastfurnace
@JMC: You may find this interesting/educational : youtube.com/watch?v=ZQFzMfHIxngengf-010

3 Answers

2
votes

The quoted passage means that when B loads the value that A stored, then by observing that the store happened, it can also be assured that everything that B did before the store has also happened and is visible.

But this doesn't tell you anything if the store has not in fact happened yet!

I would agree that if the load in your thread B returned 1, it could safely conclude that the other thread had finished its store and therefore had exited the critical section, and therefore B could safely use foo. But it is entirely possible that both loads return 0, if both threads do their loads before either one does its store. Your code doesn't even look at the value that was loaded, so both threads may enter the critical section together in that case.

The following code would be a safe, though inefficient, way to use an atomic to protect a critical section. It ensures that A will execute the critical section first, and B will wait until A has finished before proceeding. (Obviously if both threads wait for the other then you have a deadlock.)

int main() {
    std::vector<int> foo;
    std::atomic<int> bar{0};
    std::mutex mx;
    auto jobA = [&] {
        foo.emplace_back(1);
        bar.store(foo.size());
    };
    auto jobB = [&] {
        while (bar.load() == 0) /* spin */ ;
        foo.emplace_back(1);
    };

    std::thread t1(jobA);
    std::thread t2(jobB);
    t1.join();
    t2.join();
}
1
votes

Setting aside the elephant in the room that none of the C++ containers are thread safe without employing locking of some sort (so forget about using emplace_back without implementing locking), and focusing on the question of why atomic objects alone are not sufficient:

You need more than atomic objects. You also need sequencing.

All that an atomic object gives you is that when an object changes state, any other thread will either see its old value or its new value, and it will never see any "partially old/partially new", or "intermediate" value.

But it makes no guarantee whatsoever as to when other execution threads will "see" the atomic object's new value. At some point they (hopefully) will, see the atomic object's instantly flip to its new value. When? Eventually. That's all that you get from atomics.

One execution thread may very well set an atomic object to a new value, but other execution threads will still have the old value cached, in some form or fashion, and will continue to see the atomic object's old value, and won't "see" the atomic object's new value until some intermediate time passes (if ever).

Sequencing are rules that specify when objects' new values are visible in other execution threads. The simplest way to get both atomicity and easy to deal with sequencing, in one fell swoop, is to use mutexes and condition variables which handle all the hard details for you. You can still use atomics and with a careful logic use lock/release fence instructions to implement proper sequencing. But it's very easy to get it wrong, and the worst of it you won't know that it's wrong until your code starts going off the rails due to improper sequencing and it'll be nearly impossible to accurately reproduce the faulty behavior for debugging purposes.

But for nearly all common, routine, garden-variety tasks mutexes and condition variables is the most simplest solution to proper inter-thread sequencing.

1
votes

The idea is that when Thread 2's load operation syncs with the store operation of Thread1, it is guaranteed to observe all (even non-atomic) writes that happened-before the store, such as the vector-modification

Yes all writes that done by foo.emplace_back(1); would be guaranteed when bar.store(foo.size()); is executed. But who guaranteed you that foo.emplace_back(1); from thread 1 would see any/all non partial consistent state from foo.emplace_back(1); executed in thread 2 and vice versa? They both read and modify internal state of std::vector and there is no memory barrier before code reaches atomic store. And even if all variables would be read/modified atomically std::vector state consists of multiple variables - size, capacity, pointer to the data at least. Changes to all of them must be synchronized as well and memory barrier is not enough for that.

To explain little more let's create simplified example:

int a = 0;
int b = 0;
std::atomic<int> at;

// thread 1 
int foo = at.load();
a = 1;
b = 2;
at.store(foo);

// thread 2
int foo = at.load();
int tmp1 = a;
int tmp2 = b;
at.store(tmp2);

Now you have 2 problems:

  1. There is no guarantee that when tmp2 value is 2 tmp1 value would be 1 as you read a and b before atomic operation.

  2. There is no guarantee that when at.store(b) is executed that either a == b == 0 or a == 1 and b == 2, it could be a == 1 but still b == 0.

Is that clear?

But:

// thread 1 
mutex.lock();
a = 1;
b = 2;
mutex.unlock();

// thread 2
mutex.lock();
int tmp1 = a;
int tmp2 = b;
mutex.unlock();

You either get tmp1 == 0 and tmp2 == 0 or tmp1 == 1 and tmp2 == 2, do you see the difference?