I was under the impression that memory loads could not be hoisted above an acquiring load in the C++11 memory model. However looking at the code that gcc 4.8 produces that only seems to be true for other atomic loads, not all of memory. If that's true and acquiring loads don't synchronize all memory (just std::atomics
) then I'm not sure how it would be possible to implement general purpose mutexes in terms of std::atomic.
The following code:
extern std::atomic<unsigned> seq;
extern std::atomic<int> data;
int reader() {
int data_copy;
unsigned seq0;
unsigned seq1;
do {
seq0 = seq.load(std::memory_order_acquire);
data_copy = data.load(std::memory_order_relaxed);
std::atomic_thread_fence(std::memory_order_acquire);
seq1 = seq.load(std::memory_order_relaxed);
} while (seq0 != seq1);
return data_copy;
}
Produces:
_Z6readerv:
.L3:
mov ecx, DWORD PTR seq[rip]
mov eax, DWORD PTR data[rip]
mov edx, DWORD PTR seq[rip]
cmp ecx, edx
jne .L3
rep ret
Which looks correct to me.
However changing data to be an int
rather than std::atomic
:
extern std::atomic<unsigned> seq;
extern int data;
int reader() {
int data_copy;
unsigned seq0;
unsigned seq1;
do {
seq0 = seq.load(std::memory_order_acquire);
data_copy = data;
std::atomic_thread_fence(std::memory_order_acquire);
seq1 = seq.load(std::memory_order_relaxed);
} while (seq0 != seq1);
return data_copy;
}
Produces this:
_Z6readerv:
mov eax, DWORD PTR data[rip]
.L3:
mov ecx, DWORD PTR seq[rip]
mov edx, DWORD PTR seq[rip]
cmp ecx, edx
jne .L3
rep ret
So what's going on?
load(rel); fence(acq);
in second version, does its output asm change? – yohjpseq0
? If so then no, it doesn't affect the code generated at all. – jleahyseq1
. An "acquire fence" which has acquire semantics is consist ofseq1.load(relaxed) -> fence(acquire)
ops order, notfence(acquire) -> seq1.load(relaxed)
in C++11 memory model. C++'s "fence" only influences happens-before relationship between atomic operations or/and fences, it have no directly impact on non-atomic vars. In this point, C++'s "fence" is quite different from processor's/compiler's memory barrier instruction (like mfence of x86). – yohjp