Do the release-acquire visibility guarantees of std::mutex apply to only the critical section?

Question

I'm trying to understand these sections under the heading Release-Acquire ordering https://en.cppreference.com/w/cpp/atomic/memory_order

They say regarding atomic load and stores:

If an atomic store in thread A is tagged memory_order_release and an atomic load in thread B from the same variable is tagged memory_order_acquire, all memory writes (non-atomic and relaxed atomic) that happened-before the atomic store from the point of view of thread A, become visible side-effects in thread B. That is, once the atomic load is completed, thread B is guaranteed to see everything thread A wrote to memory.

Then regarding mutexes:

Mutual exclusion locks, such as std::mutex or atomic spinlock, are an example of release-acquire synchronization: when the lock is released by thread A and acquired by thread B, everything that took place in the critical section (before the release) in the context of thread A has to be visible to thread B (after the acquire) which is executing the same critical section.

The first paragraph seems to say that an atomic load and store (with memory_order_release, memory_order_acquire) thread B is guaranteed to see everything thread A wrote. including non-atomic writes.

The second paragraph seems to suggest that a mutex works the same way, except the scope of what is visible to B is limited to whatever was wrapped in the critical section, is that an accurate interpretation? or would every write, even those before the critical section be visible to B?

Congratulations, you have found your way to the darkest corners of c++11 ! I recommend reading kernel.org/doc/Documentation/memory-barriers.txt (didn't finish it myself though) — Arne J
While I am curious about how this is handled at the OS and CPU level, I think the whole point of the C++ memory model is that we shouldn't have to understand those underlying implementations in order to write software that is correct. Understanding those details should only really be necessary when implementing optimizations. I'm trying to get a better grasp of this at the C++ level before I dig any deeper. — Lockyer
@Lockyer Not only that, but an advance compiler could compile a MT program in a much more subtle way than just emitting fences while avoiding the obviously redundant ones as current compilers do. — curiousguy

Humphrey Winnebago Humphrey Winnebago · Accepted Answer · 2019-09-21T01:48:09

I think the reason the cppreference quote about mutexes is written that way is due to the fact that if you're using mutexes for synchronization, all shared variables used for communication should always be accessed inside the critical section.

The 2017 standard says in 4.7.1:

a call that acquires a mutex will perform an acquire operation on the locations comprising the mutex. Correspondingly, a call that releases the same mutex will perform a release operation on those same locations. Informally, performing a release operation on A forces prior side effects on other memory locations to become visible to other threads that later perform a consume or an acquire operation on A.

Update: I want to make sure I have a solid post because it is surprisingly hard to find this information on the web. Thanks to @Davis Herring for pointing me in the right direction.

The standard says

in 33.4.3.2.11 and 33.4.3.2.25:

mutex unlock synchronizes with subsequent lock operations that obtain ownership on the same object

(https://en.cppreference.com/w/cpp/thread/mutex/lock, https://en.cppreference.com/w/cpp/thread/mutex/unlock)

in 4.6.16:

Every value computation and side effect associated with a full-expression is sequenced before every value computation and side effect associated with the next full-expression to be evaluated.

https://en.cppreference.com/w/cpp/language/eval_order

in 4.7.1.9:

An evaluation A inter-thread happens before evaluation B if

4.7.1.9.1) -- A synchronizes-with B, or

4.7.1.9.2) -- A is dependency-ordered before B, or

4.7.1.9.3) -- for some evaluation X

4.7.1.9.3.1) ------ A synchronizes with X and X is sequenced before B, or

4.7.1.9.3.2) ------ A is sequenced before X and X inter-thread happens before B, or

4.7.1.9.3.3) ------ A inter-thread happens before X and X inter-thread happens before B.

https://en.cppreference.com/w/cpp/atomic/memory_order

So a mutex unlock B inter-thread happens before a subsequent lock C by 4.7.1.9.1.
Any evaluation A that happens in program order before the mutex unlock B also inter-thread happens before C by 4.7.1.9.3.2
Therefore after an unlock() guarantees that all previous writes, even those outside the critical section, must be visible to a matching lock().

This conclusion is consistent with the way mutexes are implemented today (and were in the past) in that all program-order previous loads and stores are completed before unlocking. (More accurately, the stores have to be visible before the unlock is visible when observed by a matching lock operation in any thread.) There's no question that this is the accepted definition of release in theory and in practice.

Do the release-acquire visibility guarantees of std::mutex apply to only the critical section?

3 Answers