How are the C++11 memory barriers implemented for x86-like systems?

Question

I have a good conceptual understanding of C++11's std::memory_order types (relaxed vs acquire-release vs sequentially consistent ...), but I'd like to have a better understanding of how they are typically implemented (by a compiler) for x86 (or x86_64) targets.

Specifically, a comparison of the low-level details (such as important memory-related CPU instructions for synchronizing state or cache between processors) for each of the order constraints (memory_order_consume, memory_order_acquire, memory_order_release, and memory_order_seq_cst).

Please provide as much low-level detail as possible, preferably for x86_64 or a similar architecture. Your help will be very much appreciated.

Jonathan Wakely Jonathan Wakely · Accepted Answer · 2013-05-29T14:01:54

On x86 and x86_64 loads have acquire semantics and stores have release semantics anyway, even without using atomics, so all the memory orders except seq_cst require no special instructions at all.

To get full sequential consistency the compiler can insert an mfence instruction to prevent reordering of operations on distinct memory locations, but I don't think any other special instructions are needed.

Compilers need to avoid moving loads and stores across atomic operations, but that's purely a limitation on the compiler optimisers and requires no CPU instructions to be issued.

See http://www.stdthread.co.uk/forum/index.php?topic=72.0 for some good information.

How are the C++11 memory barriers implemented for x86-like systems?

2 Answers