
Consider the following code:

struct payload
    std::atomic< int > value;

std::atomic< payload* > pointer( nullptr );

void thread_a()
    payload* p = new payload();
    p->value.store( 10, std::memory_order_relaxed );
    std::atomic_thread_fence( std::memory_order_release );
    pointer.store( p, std::memory_order_relaxed );

void thread_b()
    payload* p = pointer.load( std::memory_order_consume );
    if ( p )
        printf( "%d\n", p->value.load( std::memory_order_relaxed ) );

Does C++ make any guarantees about the interaction of the fence in thread a with the consume operation in thread b?

I know that in this example case I can replace the fence + atomic store with a store-release and have it work. But my question is about this particular case using the fence.

Reading the standard text I can find clauses about the interaction of a release fence with an acquire fence, and of a release fence with an acquire operation, but nothing about the interaction of a release fence and a consume operation.

Replacing the consume with an acquire would make the code standards-compliant, I think. But as far as I understand the memory ordering constraints implemented by processors, I should only really require the weaker 'consume' ordering in thread b, as the memory barrier forces all stores in thread a to be visible before the store to the pointer, and reading the payload is dependent on the read from the pointer.

Does the standard agree?

"as the memory barrier forces all stores in thread a to be visible before the store to the pointer": if speaking about x86 (or TSO in general) - this seems to be correct, but for weaker models (such as SPARC RMO) - it isn't exactly a correct description. In general (in particular, outside of TSO world) memory barriers require a counterpart memory fence in reading thread, see kernel.org/doc/Documentation/memory-barriers.txt for details. TSO can be seen as a single per-CPU write buffer and flushing it with a memory fence does make things consistent, but in general it isn't guaranteedNo-Bugs Hare
@Edmund Kapusniak I was under the impression that a load tagged with std::memory_order_consume only gave you appropriate consume semantics if the corresponding store is tagged with either release, acq_rel, or seq_cst. So the consume load might have the same guarantees if it were instead tagged with relaxed, since the store to pointer is also relaxed.Alejandro
are you developing a virus? (asking because of the payload pointer XD)CoffeDeveloper
@Alejandro "only gave you appropriate consume semantics if the corresponding store is tagged" The principle of std::atomic_thread_fence( std::memory_order_release ) is to generate a delayed "tag" for the previous last relaxed stores; IOW you can say that a release store is an immediate named store barrier, unlike the anonymous delayed barrier by a fence (a named barrier works on only that object, an anonymous applies to each one).curiousguy
@No-BugsHare "TSO can be seen as a single per-CPU write buffer and flushing it with a memory fence does make things consistent" A fence on the writer side on TSO? How is that possible? Fence what WRT what? How do you "flush" a buffer?curiousguy

2 Answers


Your code works.

I know that in this example case I can replace the fence + atomic store with a store-release and have it work. But my question is about this particular case using the fence.

Fence with relaxed atomic operation is stronger than corresponded atomic operation. E.g. (from http://en.cppreference.com/w/cpp/atomic/atomic_thread_fence, Notes):

While an atomic store-release operation prevents all preceding writes from moving past the store-release, an atomic_thread_fence with memory_order_release ordering prevents all preceding writes from moving past all subsequent stores.


Although that's clearly the intent, the way the interaction of fences and atomic operations is specified means that only listed combinations are officially supported. (That style of specification is not only verbose, difficult to read, even more difficult to turn into a valid intuition, it's easy to make incomplete.)

I see nothing in the standard supporting pairing a consume operation with a release barrier even though it's impossible for a normal implementation to not support, except by special effort during global program optimization to detect that particular use case and deliberately break it.