1
votes

On x86/x64, non-temporal store instructions such as MOVNTI and MOVNTPS make weaker memory ordering guarantees than "regular" stores. I understand fences (e.g. SFENCE) are necessary when sharing memory that will be written to non-temporally across threads. However, are fence instructions ever necessary for thread-local memory? If I write to a location via MOVNTPS, is the write guaranteed to be visible to subsequent instructions in the same thread without any fence instruction?

1
A single thread always observes its own actions in program order. The cardinal rule of out-of-order CPUs is that they always behave as if your code ran in program order. (The only exception is when the architecture has other rules: e.g. IA-64 was an experiment in explicit parallelism, where each VLIW block of instructions executed in parallel. So you could do a swap with { a=b, b=a } in the same block of instructions or something. I guess the branch-delay slot in some RISC architectures is another example.)Peter Cordes
There probably aren't ISAs where single-threaded code needs to fence anything. Cores can snoop their own store buffers pretty easily.Peter Cordes

1 Answers

4
votes

Yes, they will be visible without fences. See section 8.2.2 Memory Ordering in P6 and More Recent Processor Families in the Intel® 64 and IA-32 Architectures Software Developer's Manual Volume 3A: System Programming Guide, Part 1 which says, among others:

for memory regions defined as write-back cacheable, [...] Reads may be reordered with older writes to different locations but not with older writes to the same location.

and

Writes to memory are not reordered with other writes, with the following exceptions: -- streaming stores (writes) executed with the non-temporal move instructions (MOVNTI, MOVNTQ, MOVNTDQ, MOVNTPS, and MOVNTPD);