As known, PowerPC has weak memory model, that permit any speculative reordering: Store-Store, Load-Store, Store-Load, Load-Load.
There are at least 3 Fences:
hwsync
orsync
- full memory barrier, prevents any reorderinglwsync
- memory barriers that prevents reordering: Load-Load, Store-Store, Load-Storeisync
- instruction barrier: https://www.ibm.com/support/knowledgecenter/en/ssw_aix_71/com.ibm.aix.alangref/idalangref_isync_ics_instrs.htm
For example, can be reordered Store-stwcx.
and Load-lwz
in this code?: https://godbolt.org/g/84t5jM
lwarx 9,0,10
addi 9,9,2
stwcx. 9,0,10
bne- 0,.L2
isync
lwz 9,8(1)
As known, isync
prevents reordering lwarx
,bne
<--> any following instructions
.
But does isync
prevent reordering stwcx.
,bne
<--> any following instructions
?
I.e. can Store-stwcx.
begins earlier than the following Load-lwz
, and finishes performed later than Load-lwz
?
I.e. can Store-stwcx.
preforms Store to the Store-Buffer earlier than the following Load-lwz
begun, but the actual Store to the cache that visible for all CPU-cores occurs later than the Load-lwz
finished?
As we see from the following documents, articles and books:
isync
is not memory fence, but it is only instruction fence.isync
does not force all external accesses to complete with respect to other processors and mechanisms that access memory.isync
does not wait for all other processors to detect storage accessesisync
is a very low-overhead and very weak (lower thanlwsync
andhwsync
)isync
does not guarantee that previous and future stores will be perceived by other processors in the locally issued order - that requires one of the sync instructions.isync
is acquire barrier, but as we known, acquire can be applied only to Load-operations, not for Store (stwcx.
)isync
does not affect data accesses and does not wait for all stores to be performed.
The main question, initially: a=0, b=0
- if CPU-Core-0 do:
stwcx. [a]=1
bne-
isync
lwz [b]
. - And CPU-Core-1 do:
hwsync
stw [b]=1
hwsync
lwz [a]
hwsync
.
Then can Core-0 see [b]==1
and Core-1 see [a]==0
?
Also:
The isync prevents speculative execution from accessing the data block before the flag has been set. And in conjunction with the preceding load, compare, and conditional branch instructions, the isync guarantees that the load on which the branch depends (the load of the flag) is performed prior to any loads that occur subsequent to the isync (loads from the shared block). isync is not a memory barrier instruction, but the load-compare-conditional branch-isync sequence can provide this ordering property.
Unlike isync, sync forces all external accesses to complete with respect to other processors and mechanisms that access memory.
- Storage in the PowerPC Janice M. Stone, Robert P. Fitzgerald, 1995: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.47.4033&rep=rep1&type=pdf
Unlike sync , isync does not wait for all other processors to detect storage accesses. isync is a less conservative fence than sync because it does not delay until all processors detect previous loads and stores.
bc;isync: this is a very low-overhead and very weak form of memory fence. A specific set of preceding loads on which the bc (branch conditional) instruction depends are guaranteed to have completed before any subsequent instruction begins execution. However, store-buffer and cache-state effects can nevertheless make it appear that subsequent loads occur before the preceding loads upon which the twi instruction depends. That said, the PowerPC architecture does not permit stores to be executed speculatively, so any store following the twi;isync instruction is guaranteed to happen after any of the loads on which the bc depends.
Note that isync does not affect data accesses and does not wait for all stores to be performed.
3.5.7.2 Instruction Cache Block Invalidate (icbi)
As a result of this and other implementation-specific design optimizations, instead of requiring the instruction sequence specified by the Power ISA to be executed on a per cache-line basis, software must only execute a single sequence of three instructions to make any previous code modifications become visible:
sync
,icbi
(to any address),isync
.
ANSWER:
So, isync
doesn't guarantee Store-Load order, because "isync is not a memory barrier instruction", then isync
doesn't guarantee that any previous stores will be visible to other CPU-Cores (uses sequential-consistency) before next intruction will be finished. Instruction synchronization command isync
guarantees only the order of starting instructions, but does not guarantee the order of completion of instructions, i.e. does not guarantee the order of their visible effect to other CPU-Cores. Those, isync
allows to reorder visible effect of Store-Load in this code stwcx. [a]=1; bne-; isync; lwz [b]
.
stwcx. [a]=1; bne-; isync; lwz [b]
, and all other CPU-Cores do all steps in sequential-consistency withhwsync
for each instruction, then can these other CPU-Cores see Store-Load reordering of CPU-0? – Alex