I have some doubts about the C++11/C11 memory model that I was wondering if anyone can clarify. These are questions about the model/abstract machine, not about any real architecture.
- Are acquire/release effects guaranteed to "cascade" from one thread to the next?
Here is a pseudo code example of what I mean (assume all variables start as 0)
[Thread 1]
store_relaxed(x, 1);
store_release(a, 1);
[Thread 2]
while (load_acquire(a) == 0);
store_release(b, 1);
[Thread 3]
while (load_acquire(b) == 0);
assert(load_relaxed(x) == 1);
Thread 3's acquire syncs with Thread 2's release, which comes after Thread 2's acquire which syncs with Thread 1's release. Therefore, Thread 3 is guaranteed to see the value that Thread 1 set to x, correct? Or do we need to use seq cst here in order to be guaranteed that the assert will not fire? I have a feeling acquire/release is enough, but I can't quite find any simple explanation that guarantees it. Most explanations of acquire/release mainly focus on the acquiring thread receiving all the stores made by the releasing thread. However in the example above, Thread 2 never touches variable x, and Thread 1/Thread 3 do not touch the same atomic variable. It's obvious that if Thread 2 were to load x, it would see 1, but is that state guaranteed to cascade over into other threads which subsequently do an acquire/release sync with Thread 2? Or does Thread 3 also need to do an acquire on variable a in order to receive Thread 1's write to x?
According to https://en.cppreference.com/w/cpp/atomic/memory_order:
All writes in the current thread are visible in other threads that acquire the same atomic variable
All writes in other threads that release the same atomic variable are visible in the current thread
Since Thread 1 and Thread 3 don't touch the same atomic variable, I'm not sure if acquire/release alone is enough for the above case. There's probably an answer hiding in the formal description, but I can't quite work it out.
*EDIT: Didn't notice until after the fact, but there is an example at the link I posted ("The following example demonstrates transitive release-acquire ordering...") that is almost the same as my example, but it uses the same atomic variable across all three threads, which seems like it might be significant. I am specifically asking about the case where the variables are not the same.
- Am I right in believing that according to the standard, there must always be a pair of non-relaxed atomic operations, one in each thread, in order for any kind of memory ordering at all to be guaranteed?
Imagine there is a function "get_data" that allocates a buffer, writes some data to it, and returns a pointer to the buffer. And there is a function "use_data" that takes the pointer to the buffer and does something with the data. Thread 1 gets a buffer from get_data and passes it to Thread 2 using a relaxed atomic store to a global atomic pointer. Thread 2 does relaxed atomic loads in a loop until it gets the pointer, and then passes it off to use_data:
int* get_data() {...}
void use_data(int* buf) {...}
int* global_ptr = nullptr;
[Thread 1]
int* buf = get_data();
super_duper_memory_fence();
store_relaxed(global_ptr, buf);
[Thread 2]
int* buf = nullptr;
while ((buf = load_relaxed(global_ptr)) == nullptr);
use_data(buf);
Is there any kind of operation at all that can be put in "super_duper_memory_fence", that will guarantee that by the time use_data gets the pointer, the data in the buffer is also visible? It is my understanding that there is not a portable way to do this, and that Thread 2 must have a matching fence or other atomic operation in order to guarantee that it receives the writes made into the buffer and not just the pointer value. Is this correct?