Are memory barriers needed because of cpu out of order execution or because of cache consistency problem?

Question

I'm wonderring why are memory barriers needed and I have read some articles about this toppic.
Someone says it's because of cpu out-of-order execution while others say it is because of cache consistency problems which store buffer and invalidate queue cause.
So, what's the real reason that memory barriers are needed? cpu out-of-order execution or cache consistency problems? or both? Does cpu out-of-order execution have something to do with cache consistency? and what's the difference between x86 and arm?

It has to do with neither specifically. They basically stop new transactions and allows transactions in flight to complete to avoid race conditions that can cause something undesirable/predictable to happen within a specific system design. Allows you to perform specific transactions into a system in a known state. — old_timer
With all the parallel things going on normally it is essentially controlled chaos, this will pause the chaos. Like stopping traffic to help a slow/elderly person across the road, and then the chaos can continue. — old_timer
Some systems will have separate instruction barriers and data barriers to handle or isolate the different areas. The places where you need them are very specific to a system that doesnt mean x86 this and arm that or cache this and pipeline that, but this specific x86 processor, this specific arm core implemented in this way needs a barrier before performing this operation. And not all x86 processors or arm cores need it in that place for that operation. They are used to prevent potential race conditions causing undesirable or unpredictable results. — old_timer

Peter Cordes Peter Cordes · Accepted Answer · 2020-09-19T15:44:50

You need barriers to order this core / thread's accesses to globally-visible coherent cache when the ISA's memory ordering rules are weaker than the semantics you need for your algorithm.

Cache is always coherent, but that's a separate thing from consistency (ordering between multiple operations).

You can have memory reordering on an in-order CPU. In more detail, How is load->store reordering possible with in-order commit? shows how you can get memory reordering on a pipeline that starts executing instructions in program order, but with a cache that allows hit-under-miss and/or a store buffer allowing OoO commit.

Does an x86 CPU reorder instructions? talks about the difference between memory reordering vs. out of order exec. (And how x86's strongly ordered memory model is implemented on top of aggressive out-of-order execution by having hardware track ordering, with the store buffer decoupling store execution from store visibility to other threads/cores.)
x86 memory ordering: Loads Reordered with Earlier Stores vs. Intra-Processor Forwarding
Globally Invisible load instructions

See also https://preshing.com/20120710/memory-barriers-are-like-source-control-operations/ and https://preshing.com/20120930/weak-vs-strong-memory-models for some more basics. x86 has a "strong" memory ordering model: program order plus a store buffer with store-forwarding. C++ acquire and release are "free", only atomic RMWs and seq_cst stores need barriers.

ARM has a "weak" memory ordering model: only C++ memory_order_consume (data dependency ordering) is "free", acquire and release require special instructions (like ldar / stlr) or barriers.

Are memory barriers needed because of cpu out of order execution or because of cache consistency problem?

1 Answers