I have a good conceptual understanding of C++11's std::memory_order
types (relaxed vs acquire-release vs sequentially consistent ...), but I'd like to have a better understanding of how they are typically implemented (by a compiler) for x86 (or x86_64) targets.
Specifically, a comparison of the low-level details (such as important memory-related CPU instructions for synchronizing state or cache between processors) for each of the order constraints (memory_order_consume
, memory_order_acquire
, memory_order_release
, and memory_order_seq_cst
).
Please provide as much low-level detail as possible, preferably for x86_64 or a similar architecture. Your help will be very much appreciated.