0
votes

I have been studying the memory order semantics in C++ 11 and having some difficulty in understanding how memory_order_acquire works in a CPU level.

According to the cppreference;

A load operation with this memory order performs the acquire operation on the affected memory location: no reads or writes in the current thread can be reordered before this load. All writes in other threads that release the same atomic variable are visible in the current thread (see Release-Acquire ordering below)

The part I really can't understand is;

no reads or writes in the current thread can be reordered before this load.

What happens if the CPU has already reordered commands before even reaching 'memory_order_acquire' part? Does the CPU reverts all the work has done? How does this can be guaranteed?

Thank you.

2
I think you may be misinterpreting that line. It means that later reads/writes can't be reordered so that they occur before this load. Not that earlier reads/writes can't be reordered relative to each otherhappydave

2 Answers

1
votes

CPU's don't "reach" the memory_order_acquire part. Those are instructions for the compiler. The compiler has to translate that, using its knowledge of the CPU memory model.

For instance, if a CPU will only reorder over a maximum of 2 instructions, inserting 2 NOP instructions would be a rather trivial way to achieve that part of the semantics.

0
votes

As noted in the second paragraph here

The instructions of the program may not be run in the correct order, as long as the end result is correct.

OoOE doesn't just blindly execute anything that's available. The CPU will contain logic that expressly prohibits reordering those accesses across the boundary. As noted elsewhere in that article, the silicon cost of OoOE is quite expensive, quite likely due to issues of this sort.

As noted in this SO question memory barriers do come with a cost - that makes a lot of sense in the light of the above. Basically they do cause the normal OoOE pipeline to take a hit.