No, standard C++11 doesn't guarantee that memory_order_seq_cst
prevents StoreLoad reordering of non-atomic
around an atomic(seq_cst)
.
Even standard C++11 doesn't guarantee that memory_order_seq_cst
prevents StoreLoad reordering of atomic(non-seq_cst)
around an atomic(seq_cst)
.
Working Draft, Standard for Programming Language C++ 2016-07-12: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/n4606.pdf
- There shall be a single total order S on all
memory_order_seq_cst
operations - C++11 Standard:
§ 29.3
3
There shall be a single total order S on all memory_order_seq_cst
operations, consistent with the “happens before” order and
modification orders for all affected locations, such that each
memory_order_seq_cst operation B that loads a value from an atomic
object M observes one of the following values: ...
- But, any atomic operations with ordering weaker than
memory_order_seq_cst
hasn't sequential consistency and hasn't single total order, i.e. non-memory_order_seq_cst
operations can be reordered with memory_order_seq_cst
operations in allowed directions - C++11 Standard:
§ 29.3
8 [ Note: memory_order_seq_cst ensures sequential consistency
only for a program that is free of data races and uses exclusively
memory_order_seq_cst operations. Any use of weaker ordering will
invalidate this guarantee unless extreme care is used. In particular,
memory_order_seq_cst fences ensure a total order only for the fences
themselves. Fences cannot, in general, be used to restore sequential
consistency for atomic operations with weaker ordering specifications.
— end note ]
Also C++-compilers allows such reorderings:
- On x86_64
Usually - if in compilers seq_cst implemented as barrier after store, then:
STORE-C(relaxed);
LOAD-B(seq_cst);
can be reordered to LOAD-B(seq_cst);
STORE-C(relaxed);
Screenshot of Asm generated by GCC 7.0 x86_64: https://godbolt.org/g/4yyeby
Also, theoretically possible - if in compilers seq_cst implemented as barrier before load, then:
STORE-A(seq_cst);
LOAD-C(acq_rel);
can be reordered to LOAD-C(acq_rel);
STORE-A(seq_cst);
- On PowerPC
STORE-A(seq_cst);
LOAD-C(relaxed);
can be reordered to LOAD-C(relaxed);
STORE-A(seq_cst);
Also on PowerPC can be such reordering:
STORE-A(seq_cst);
STORE-C(relaxed);
can reordered to STORE-C(relaxed);
STORE-A(seq_cst);
If even atomic variables are allowed to be reordered across atomic(seq_cst), then non-atomic variables can also be reordered across atomic(seq_cst).
Screenshot of Asm generated by GCC 4.8 PowerPC: https://godbolt.org/g/BTQBr8
More details:
- On x86_64
STORE-C(release);
LOAD-B(seq_cst);
can be reordered to LOAD-B(seq_cst);
STORE-C(release);
Intel® 64 and IA-32 Architectures
8.2.3.4 Loads May Be Reordered with Earlier Stores to Different Locations
I.e. x86_64 code:
STORE-A(seq_cst);
STORE-C(release);
LOAD-B(seq_cst);
Can be reordered to:
STORE-A(seq_cst);
LOAD-B(seq_cst);
STORE-C(release);
This can happen because between c.store
and b.load
isn't mfence
:
x86_64 - GCC 7.0: https://godbolt.org/g/dRGTaO
C++ & asm - code:
#include <atomic>
// Atomic load-store
void test() {
std::atomic<int> a, b, c;
a.store(2, std::memory_order_seq_cst); // movl 2,[a]; mfence;
c.store(4, std::memory_order_release); // movl 4,[c];
int tmp = b.load(std::memory_order_seq_cst); // movl [b],[tmp];
}
It can be reordered to:
#include <atomic>
// Atomic load-store
void test() {
std::atomic<int> a, b, c;
a.store(2, std::memory_order_seq_cst); // movl 2,[a]; mfence;
int tmp = b.load(std::memory_order_seq_cst); // movl [b],[tmp];
c.store(4, std::memory_order_release); // movl 4,[c];
}
Also, Sequential Consistency in x86/x86_64 can be implemented in four ways: http://www.cl.cam.ac.uk/~pes20/cpp/cpp0xmappings.html
LOAD
(without fence) and STORE
+ MFENCE
LOAD
(without fence) and LOCK XCHG
MFENCE
+ LOAD
and STORE
(without fence)
LOCK XADD
( 0 ) and STORE
(without fence)
- 1 and 2 ways:
LOAD
and (STORE
+MFENCE
)/(LOCK XCHG
) - we reviewed above
- 3 and 4 ways: (
MFENCE
+LOAD
)/LOCK XADD
and STORE
- allow next reordering:
STORE-A(seq_cst);
LOAD-C(acq_rel);
can be reordered to LOAD-C(acq_rel);
STORE-A(seq_cst);
- On PowerPC
STORE-A(seq_cst);
LOAD-C(relaxed);
can be reordered to LOAD-C(relaxed);
STORE-A(seq_cst);
Allows Store-Load reordering (Table 5 - PowerPC): http://www.rdrop.com/users/paulmck/scalability/paper/whymb.2010.06.07c.pdf
Stores Reordered After Loads
I.e. PowerPC code:
STORE-A(seq_cst);
STORE-C(relaxed);
LOAD-C(relaxed);
LOAD-B(seq_cst);
Can be reordered to:
LOAD-C(relaxed);
STORE-A(seq_cst);
STORE-C(relaxed);
LOAD-B(seq_cst);
PowerPC - GCC 4.8 : https://godbolt.org/g/xowFD3
C++ & asm - code:
#include <atomic>
// Atomic load-store
void test() {
std::atomic<int> a, b, c; // addr: 20, 24, 28
a.store(2, std::memory_order_seq_cst); // li r9<-2; sync; stw r9->[a];
c.store(4, std::memory_order_relaxed); // li r9<-4; stw r9->[c];
c.load(std::memory_order_relaxed); // lwz r9<-[c];
int tmp = b.load(std::memory_order_seq_cst); // sync; lwz r9<-[b]; ... isync;
}
By dividing a.store
into two parts - it can be reordered to:
#include <atomic>
// Atomic load-store
void test() {
std::atomic<int> a, b, c; // addr: 20, 24, 28
//a.store(2, std::memory_order_seq_cst); // part-1: li r9<-2; sync;
c.load(std::memory_order_relaxed); // lwz r9<-[c];
a.store(2, std::memory_order_seq_cst); // part-2: stw r9->[a];
c.store(4, std::memory_order_relaxed); // li r9<-4; stw r9->[c];
int tmp = b.load(std::memory_order_seq_cst); // sync; lwz r9<-[b]; ... isync;
}
Where load-from-memory lwz r9<-[c];
executed earlier than store-to-memory stw r9->[a];
.
Also on PowerPC can be such reordering:
STORE-A(seq_cst);
STORE-C(relaxed);
can reordered to STORE-C(relaxed);
STORE-A(seq_cst);
Because PowerPC has weak memory ordering model - allows Store-Store reordering (Table 5 - PowerPC): http://www.rdrop.com/users/paulmck/scalability/paper/whymb.2010.06.07c.pdf
Stores Reordered After Stores
I.e. on PowerPC operations Store can be reordered with other Store, then previous example can be reordered such as:
#include <atomic>
// Atomic load-store
void test() {
std::atomic<int> a, b, c; // addr: 20, 24, 28
//a.store(2, std::memory_order_seq_cst); // part-1: li r9<-2; sync;
c.load(std::memory_order_relaxed); // lwz r9<-[c];
c.store(4, std::memory_order_relaxed); // li r9<-4; stw r9->[c];
a.store(2, std::memory_order_seq_cst); // part-2: stw r9->[a];
int tmp = b.load(std::memory_order_seq_cst); // sync; lwz r9<-[b]; ... isync;
}
Where store-to-memory stw r9->[c];
executed earlier than store-to-memory stw r9->[a];
.