The hint as to why this works can be found in the first sentence of the description on the page you linked (emphasis mine):
std::memory_order
specifies how memory accesses, including regular,
non-atomic memory accesses, are to be ordered around an atomic
operation.
Notice how this talks not about the memory access on the atomic itself, but rather on the memory accesses surrounding the atomic. Concurrent accesses to a single atomic always have strict ordering requirements, otherwise it would be impossible to reason about their behavior in the first place.
In case of the counter, you get the guarantee that fetch_add
will behave pretty much as expected: The counter gets increased one at a time, no values are skipped and no values will be counted twice. You can easily verify this by inspecting the return values of the individual fetch_add
calls. You get those guarantees always, regardless of the memory ordering.
Things get interesting as soon as you assign meaning to those counter values in the context of the surrounding program logic. For instance, you could use a certain counter value to indicate that a particular piece of data has been made available by an earlier computation step. This will require memory orderings, if that relationship between the counter and the data needs to persist across threads: With the relaxed ordering, at the point where you observe the counter value you are waiting for, you have no guarantee that the data you are waiting for is ready as well. Even if the counter is set after the data has been written by the producing thread, this ordering of memory operations does not translate across thread boundaries. You will need to specify a memory order that orders the write to the data with respect to the change of the counter across threads.
The crucial thing to understand here is that while the operations are guaranteed to happen in a certain order within one thread, that ordering is no longer guaranteed when observing the same data from a different thread.
So the rule of thumb is: If you're only manipulating an atomic in isolation, you don't need any ordering. As soon as that manipulation is interpreted in the context of other unrelated memory accesses (even if those accesses are themselves atomics!) you need to worry about using the correct ordering.
The usual advice applies that, unless you have really, really, really good reasons for doing so, you should just stick with the default memory_order_seq_cst
. As an application developer you don't want to mess with memory orderings unless you have strong empirical evidence that it is worth the trouble you will undoubtedly run into.