What is the difference in logic and performance between x86-instructions LOCK XCHG and MOV+MFENCE for doing a sequential-consistency store.
(We ignore the load result of the XCHG; compilers other than gcc use it for the store + memory barrier effect.)
Is it true, that for sequential consistency, during the execution of an atomic operation: LOCK XCHG locks only a single cache-line, and vice versa MOV+MFENCE locks whole cache-L3(LLC)?
atomic)/C++11(std::atomic) for all ordering in x86 except SC(sequential consistency): en.cppreference.com/w/cpp/atomic/memory_order But i said that MFENCE provide sequential consistency for atomic variables as we can see in C11(atomic)/C++11(std::atomic) in GCC4.8.2: stackoverflow.com/questions/19047327/… - Alexmovis atomic for unaligned access, by the way.) - Kerrek SBMOV+MFENCE(SC in GCC4.8.2) we can replace onLOCK XCHGfor SC as we can see in video where on 0:28:20 said that MFENCE more expensive that XCHG: channel9.msdn.com/Shows/Going+Deep/… - Alex