Is it possible to use memory barriers only on the storing side

Question

First, some context: I'm working with a pre-C11, inline-asm-based atomic model, but for the purposes of this I'm happy to ignore the C aspect (and any compiler barrier issues, which I can deal with separately) and consider it essentially just an asm/cpu-architecture question.

Suppose I have code that looks like:

various stores
barrier
store flag
barrier

I want to be able to read flag from another cpu core and conclude that the various stores were already performed and made visible. Is it possible to do so without any kind of memory barrier instruction on the loading side? Clearly it's possible at least on some cpu architectures, for example x86 where an explicit memory barrier is not needed on either core. But what about in general? Does it vary widely by cpu arch whether this is possible?

AFAIK, Alpha needs barriers, while ARM/PPC need either barriers, or address/data dependencies, or RW control dependencies, or RR control depencies + ISYNC/ISB between the read of flag and the operation that depends on it. For ARM/PPC, you may be interested in "A Tutorial Introduction to the ARM and POWER Relaxed Memory Models". — ninjalj
Another data point: according to the consume memory order proposal at open-std.org/jtc1/sc22/wg14/www/docs/n1444.htm, some embedded MIPS CPUs can also avoid barriers by using dependencies (older, "true" MIPS are supposedly seq-cst). Also, given that smp_read_barrier_depends() in the Linux kernel is only a barrier for Alpha, it seems that if there is a (possibly fake) address dependency on the reading side, the read barrier can be avoided (save for Alpha). Making the compiler preserve the dependency is a whole another issue. — ninjalj

NPE NPE · Accepted Answer · 2014-10-10T06:09:16

If a CPU were to reorder the loads, your code would require a load barrier in order to work correctly. There are plenty of architectures that do such reordering; see the table in Memory ordering for some examples.

Thus in the general case your code does require load barriers.

x86 is not very typical in that it provides pretty stringent memory ordering guarantees. See Who ordered memory fences on an x86? for a discussion.

Is it possible to use memory barriers only on the storing side

1 Answers