25
votes

My understanding of std::memory_order_acquire and std::memory_order_release is as follows:

Acquire means that no memory accesses which appear after the acquire fence can be reordered to before the fence.

Release means that no memory accesses which appear before the release fence can be reordered to after the fence.

What I don't understand is why with the C++11 atomics library in particular, the acquire fence is associated with load operations, while the release fence is associated with store operations.

To clarify, the C++11 <atomic> library enables you to specify memory fences in two ways: either you can specify a fence as an extra argument to an atomic operation, like:

x.load(std::memory_order_acquire);

Or you can use std::memory_order_relaxed and specify the fence separately, like:

x.load(std::memory_order_relaxed);
std::atomic_thread_fence(std::memory_order_acquire);

What I don't understand is, given the above definitions of acquire and release, why does C++11 specifically associate acquire with load, and release with store? Yes, I've seen many of the examples that show how you can use an acquire/load with a release/store to synchronize between threads, but in general it seems that the idea of acquire fences (prevent memory reordering after statement) and release fences (prevent memory reordering before statement) is orthogonal to the idea of loads and stores.

So, why, for example, won't the compiler let me say:

x.store(10, std::memory_order_acquire);

I realize I can accomplish the above by using memory_order_relaxed, and then a separate atomic_thread_fence(memory_order_acquire) statement, but again, why can't I use store directly with memory_order_acquire?

A possible use case for this might be if I want to ensure that some store, say x = 10, happens before some other statement executes that might affect other threads.

3
In a typical lock-free algorithm, you read an atomic to see if a shared resource is ready for consumption (ready to be acquired), and you write an atomic to indicate that a shared resource is ready to be used (to release the resource). You don't want reads of the shared resource to move before the atomic guarding it is checked; and you don't want initialization of the to-be-shared resource to move after the atomic is written to, indicating release.Igor Tandetnik
In the example only atomic_thread_fence(std::memory_order_acquire) is a true fence. See 1.10:5 Multi-threaded executions and data races [intro.multithread] in the standard, which says (quoting the draft n3797) "A synchronization operation without an associated memory location is a fence and can be either an acquire fence, a release fence, or both an acquire and release fence." In contrast, x.load(std::memory_order_acquire) is an atomic operation that does an acquire operation on x, it would be a synchronization operation if the value matches a store release into x.amdn
In the introduction the standard (draft n3797) doesn't restrict acquire operations to loads and release operations to stores. That is unfortunate. You have to go to clause 29.3:1 Order and consistency [atomics.order] to find "memory_order_acquire, memory_order_acq_rel, and memory_order_seq_cst: a load operation performs an acquire operation on the affected memory location" and "memory_order_release, memory_order_acq_rel, and memory_order_seq_cst: a store operation performs a release operation on the affected memory location"amdn
@amdn But even a "true fence" doesn't have to produce a CPU fence at all; it interacts with precedent or subsequent atomic operations to produce some effect. Only very naive compilers will associate a given CPU instruction to each source code occurrence of a "true fence".curiousguy
"is orthogonal to the idea of loads and stores" Under atomic semantics as reads aren't even ordered events in the modification order. You need a write to get a place into that order; even you just always write the exact same value, the writes of the exact same value is ordered. Then you speak of after that write event in the modification order. (Physically that means a cache has taken the cache line.) But a release read would be ambiguous as other reads of the same write event aren't ordered. Would you change the semantic to include reads in the modification order?curiousguy

3 Answers

30
votes

Say I write some data, and then I write an indication that the data is now ready. It's imperative that no other thread who sees the indication that the data is ready not see the write of the data itself. So prior writes cannot move past that write.

Say I read that some data is ready. It's imperative that any reads I issue after seeing that take place after the read that saw that the data was ready. So subsequent reads cannot move behind that read.

So when you do a synchronized write, you typically need to make sure that all writes you did before that are visible to anyone who sees the synchronized write. And when you do a synchronized read, it's typically imperative that any reads you do after that take place after the synchronized read.

Or, to put it another way, an acquire is typically reading that you can take or access the resource, and subsequent reads and writes must not be moved before it. A release is typically writing that you are done with the resource, and preceding writes must not be moved to after it.

0
votes

I think this Jeff Preshing's post could answer your question.

-3
votes

std::memory_order_acquire fence only ensures all load operation after the fence is not reordered before any load operation before the fence, thus memory_order_acquire cannot ensure the store is visible for other threads when after loads are executed. This is why memory_order_acquire is not supported for store operation, you may need memory_order_seq_cst to achieve the acquire of store.

As an alternative, you may say

x.store(10, std::memory_order_releaxed);
x.load(std::memory_order_acquire);  // this introduce a data dependency

to ensure all loads not reordered before the store. Again, the fence not work here.

Besides, memory order in atomic operation could be cheaper than a memory fence, because it only ensures the order relative to the atomic instruction, not all instruction before and after the fence.

See also formal description and explanation for detail.