Consider an atomic read-modify-write operation such as x.exchange(..., std::memory_order_acq_rel)
. For purposes of ordering with respect to loads and stores to other objects, is this treated as:
a single operation with acquire-release semantics?
Or, as an acquire load followed by a release store, with the added guarantee that other loads and stores to
x
will observe both of them or neither?
If it's #2, then although no other operations in the same thread could be reordered before the load or after the store, it leaves open the possibility that they could be reordered in between the two.
As a concrete example, consider:
std::atomic<int> x, y;
void thread_A() {
x.exchange(1, std::memory_order_acq_rel);
y.store(1, std::memory_order_relaxed);
}
void thread_B() {
// These two loads cannot be reordered
int yy = y.load(std::memory_order_acquire);
int xx = x.load(std::memory_order_acquire);
std::cout << xx << ", " << yy << std::endl;
}
Is it possible for thread_B
to output 0, 1
?
If the x.exchange()
were replaced by x.store(1, std::memory_order_release);
then thread_B
could certainly output 0, 1
. Should the extra implicit load in exchange()
rule that out?
cppreference makes it sound like #1 is the case and 0, 1
is forbidden:
A read-modify-write operation with this memory order is both an acquire operation and a release operation. No memory reads or writes in the current thread can be reordered before or after this store.
But I can't find anything explicit in the standard to support this. Actually the standard says very little about atomic read-modify-write operations at all, except 31.4 (10) in N4860 which is just the obvious property that the read has to read the last value written before the write. So although I hate to question cppreference, I'm wondering if this is actually correct.
I'm also looking at how it's implemented on ARM64. Both gcc and clang compile thread_A
as essentially
ldaxr [x]
stlxr #1, [x]
str #1, [y]
(See on godbolt.) Based on my understanding of ARM64 semantics, and some tests (with a load of y
instead of a store), I think that the str [y]
can become visible before the stlxr [x]
(though of course not before the ldaxr
). This would make it possible for thread_B
to observe 0, 1
. So if #1 is true then it would seem that gcc and clang are both wrong, which I hesitate to believe.
Finally, as far as I can tell, replacing memory_order_acq_rel
with seq_cst
wouldn't change anything about this analysis, since it only adds semantics with respect to other seq_cst
operations, and we don't have any here.
I found What exact rules in the C++ memory model prevent reordering before acquire operations? which, if I understand it correctly, seems to agree that #2 is correct, and that 0, 1
could be observed. I'd still appreciate confirmation, as well as a check on whether the cppreference quote is actually wrong or if I'm misunderstanding it.
thread_B
performs a load on bothx
andy
, but those are separate operations and as such, do not reflect the current state inthread_A
. Regardless of any ordering, ifx
is loaded whenthread_A
has not done anything yet andy
is loaded whenthread_A
has finished, you can get the0,1
output – LWimsey0,1
is forbidden, but based on how acquire operations enforce ordering, I don't believe0,1
is possible in this case (edit #2). – LWimseyy.store()
in A doesn't synchronize with they.load()
in B, because 31.4 (2) only guarantees that if they.store()
were release? And of course if they.store()
were release then there would be no problem and we could definitely not get0, 1
. – Nate Eldredgeldaxr / stlxr / ldr
being reordered. I can try to clean it up and post it later. Haven't been able to do the same withldaxr / stlxr / str
however. – Nate Eldredge