Memory model ordering and visibility?

Question

I tried looking for details on this, I even read the standard on mutexes and atomics... but still I couldnt understand the C++11 memory model visibility guarantees. From what I understand the very important feature of mutex BESIDE mutual exclusion is ensuring visibility. Aka it is not enough that only one thread per time is increasing the counter, it is important that the thread increases the counter that was stored by the thread that was last using the mutex(I really dont know why people dont mention this more when discussing mutexes, maybe I had bad teachers :)). So from what I can tell atomic doesnt enforce immediate visibility: (from the person that maintains boost::thread and has implemented c++11 thread and mutex library):

A fence with memory_order_seq_cst does not enforce immediate visibility to other threads (and neither does an MFENCE instruction). The C++0x memory ordering constraints are just that --- ordering constraints. memory_order_seq_cst operations form a total order, but there are no restrictions on what that order is, except that it must be agreed on by all threads, and it must not violate other ordering constraints. In particular, threads may continue to see "stale" values for some time, provided they see values in an order consistent with the constraints.

And I'm OK with that. But the problem is that I have trouble understanding what C++11 constructs regarding atomic are "global" and which only ensure consistency on atomic variables. In particular I have understanding which(if any) of the following memory orderings guarantee that there will be a memory fence before and after load and stores: http://www.stdthread.co.uk/doc/headers/atomic/memory_order.html

From what I can tell std::memory_order_seq_cst inserts mem barrier while other only enforce ordering of the operations on certain memory location.

So can somebody clear this up, I presume a lot of people are gonna be making horrible bugs using std::atomic , esp if they dont use default (std::memory_order_seq_cst memory ordering)
2. if I'm right does that mean that second line is redundand in this code:

atomicVar.store(42);
std::atomic_thread_fence(std::memory_order_seq_cst);

3. do std::atomic_thread_fences have same requirements as mutexes in a sense that to ensure seq consistency on nonatomic vars one must do std::atomic_thread_fence(std::memory_order_seq_cst); before load and std::atomic_thread_fence(std::memory_order_seq_cst);
after stores?
4. Is

  {
    regularSum+=atomicVar.load();
    regularVar1++;
    regularVar2++;
    }
    //...
    {
    regularVar1++;
    regularVar2++;
    atomicVar.store(74656);
  }

equivalent to

std::mutex mtx;
{
   std::unique_lock<std::mutex> ul(mtx);
   sum+=nowRegularVar;
   regularVar++;
   regularVar2++;
}
//..
{
   std::unique_lock<std::mutex> ul(mtx);
    regularVar1++;
    regularVar2++;
    nowRegularVar=(74656);
}

I think not, but I would like to be sure.

EDIT: 5. Can assert fire?
Only two threads exist.

atomic<int*> p=nullptr;

first thread writes

{
    nonatomic_p=(int*) malloc(16*1024*sizeof(int));
    for(int i=0;i<16*1024;++i)
    nonatomic_p[i]=42;
    p=nonatomic;
}

second thread reads

{
    while (p==nullptr)
    {
    }
    assert(p[1234]==42);//1234-random idx in array
}

Anthony Williams Anthony Williams · Accepted Answer · 2011-10-19T16:54:50

If you like to deal with fences, then a.load(memory_order_acquire) is equivalent to a.load(memory_order_relaxed) followed by atomic_thread_fence(memory_order_acquire). Similarly, a.store(x,memory_order_release) is equivalent to a call to atomic_thread_fence(memory_order_release) before a call to a.store(x,memory_order_relaxed). memory_order_consume is a special case of memory_order_acquire, for dependent data only. memory_order_seq_cst is special, and forms a total order across all memory_order_seq_cst operations. Mixed with the others it is the same as an acquire for a load, and a release for a store. memory_order_acq_rel is for read-modify-write operations, and is equivalent to an acquire on the read part and a release on the write part of the RMW.

The use of ordering constraints on atomic operations may or may not result in actual fence instructions, depending on the hardware architecture. In some cases the compiler will generate better code if you put the ordering constraint on the atomic operation rather than using a separate fence.

On x86, loads are always acquire, and stores are always release. memory_order_seq_cst requires stronger ordering with either an MFENCE instruction or a LOCK prefixed instruction (there is an implementation choice here as to whether to make the store have the stronger ordering or the load). Consequently, standalone acquire and release fences are no-ops, but atomic_thread_fence(memory_order_seq_cst) is not (again requiring an MFENCE or LOCKed instruction).

An important effect of the ordering constraints is that they order other operations.

std::atomic<bool> ready(false);
int i=0;

void thread_1()
{
    i=42;
    ready.store(true,memory_order_release);
}

void thread_2()
{
    while(!ready.load(memory_order_acquire)) std::this_thread::yield();
    assert(i==42);
}

thread_2 spins until it reads true from ready. Since the store to ready in thread_1 is a release, and the load is an acquire then the store synchronizes-with the load, and the store to i happens-before the load from i in the assert, and the assert will not fire.

2) The second line in

atomicVar.store(42);
std::atomic_thread_fence(std::memory_order_seq_cst);

is indeed potentially redundant, because the store to atomicVar uses memory_order_seq_cst by default. However, if there are other non-memory_order_seq_cst atomic operations on this thread then the fence may have consequences. For example, it would act as a release fence for a subsequent a.store(x,memory_order_relaxed).

3) Fences and atomic operations do not work like mutexes. You can use them to build mutexes, but they do not work like them. You do not have to ever use atomic_thread_fence(memory_order_seq_cst). There is no requirement that any atomic operations are memory_order_seq_cst, and ordering on non-atomic variables can be achieved without, as in the example above.

4) No these are not equivalent. Your snippet without the mutex lock is thus a data race and undefined behaviour.

5) No your assert cannot fire. With the default memory ordering of memory_order_seq_cst, the store and load from the atomic pointer p work like the store and load in my example above, and the stores to the array elements are guaranteed to happen-before the reads.

Memory model ordering and visibility?

2 Answers