Why GCC does not use LOAD(without fence) and STORE+SFENCE for Sequential Consistency?

Question

Here are four approaches to make Sequential Consistency in x86/x86_64:

LOAD(without fence) and STORE+MFENCE
LOAD(without fence) and LOCK XCHG
MFENCE+LOAD and STORE(without fence)
LOCK XADD(0) and STORE(without fence)

As it is written here: http://www.cl.cam.ac.uk/~pes20/cpp/cpp0xmappings.html

C/C++11 Operation x86 implementation

Load Seq_Cst: MOV (from memory)

Store Seq Cst: (LOCK) XCHG // alternative: MOV (into memory),MFENCE

Note: there is an alternative mapping of C/C++11 to x86, which instead of locking (or fencing) the Seq Cst store locks/fences the Seq Cst load:

Load Seq_Cst: LOCK XADD(0) // alternative: MFENCE,MOV (from memory)

Store Seq Cst: MOV (into memory)

GCC 4.8.2(GDB in x86_64) uses first(1) approach for C++11-std::memory_order_seq_cst, i.e. LOAD(without fence) and STORE+MFENCE:

std::atomic<int> a;
int temp = 0;
a.store(temp, std::memory_order_seq_cst);
0x4613e8  <+0x0058>         mov    0x38(%rsp),%eax
0x4613ec  <+0x005c>         mov    %eax,0x20(%rsp)
0x4613f0  <+0x0060>         mfence

As we know, that MFENCE = LFENCE+SFENCE. Then this code we can rewrite to this: LOAD(without fence) and STORE+LFENCE+SFENCE

Questions:

Why do we need not to use LFENCE here before LOAD, and need to use LFENCE after STORE (because LFENCE make sense only before LOAD!)?
Why GCC does not use approach: LOAD(without fence) and STORE+SFENCE for std::memory_order_seq_cst?

What do you mean with LFENCE before LOAD? In your source code you assign a zero value to a, which is a store and not a load and then it makes no difference if lfence is called before or after the mov instruction. — smossen
@smossen I mean definitely that LFENCE make sense only before LOAD, and LFENCE don't make any sense after STORE in any cases. — Alex
std::memory_order_seq_cst implies lfence+sfence. This triggers synchronization of all other variables that are not declared atomic, thus not calling lfence+sfence (or mfence) when the standard says so would change semantics. If you have a variable "int b;" and another thread has assigned b=1 and then called sfence, this will be visible to this thread first when this thread calls lfence (which could be done by storing a new value into the atomic variable a). — smossen
@smossen and Alex: sfence + lfence is still not a StoreLoad barrier (preshing.com/20120710/… explains how StoreLoad barriers are special). x86 has a strong memory model where LFENCE and SFENCE only exist for use with movnt loads/stores, which are weakly ordered as well as bypassing the cache. See stackoverflow.com/questions/32705169/…. — Peter Cordes

briand briand · Accepted Answer · 2013-10-12T11:30:16

The only reordering x86 does (for normal memory accesses) is that it can potentially reorder a load that follows a store.

SFENCE guarantees that all stores before the fence complete before all stores after the fence. LFENCE guarantees that all loads before the fence complete before all loads after the fence. For normal memory accesses, the ordering guarantees of individual SFENCE or LFENCE operations are already provided by default. Basically, LFENCE and SFENCE by themselves are only useful for the weaker memory access modes of x86.

Neither LFENCE, SFENCE, nor LFENCE + SFENCE prevents a store followed by a load from being reordered. MFENCE does.

The relevant reference is the Intel x86 architectural manual.

Why GCC does not use LOAD(without fence) and STORE+SFENCE for Sequential Consistency?

4 Answers