On x86/x64, non-temporal store instructions such as MOVNTI
and MOVNTPS
make weaker memory ordering guarantees than "regular" stores. I understand fences (e.g. SFENCE
) are necessary when sharing memory that will be written to non-temporally across threads. However, are fence instructions ever necessary for thread-local memory? If I write to a location via MOVNTPS
, is the write guaranteed to be visible to subsequent instructions in the same thread without any fence instruction?
1
votes
1 Answers
4
votes
Yes, they will be visible without fences. See section 8.2.2 Memory Ordering in P6 and More Recent Processor Families in the Intel® 64 and IA-32 Architectures Software Developer's Manual Volume 3A: System Programming Guide, Part 1 which says, among others:
for memory regions defined as write-back cacheable, [...] Reads may be reordered with older writes to different locations but not with older writes to the same location.
and
Writes to memory are not reordered with other writes, with the following exceptions: -- streaming stores (writes) executed with the non-temporal move instructions (MOVNTI, MOVNTQ, MOVNTDQ, MOVNTPS, and MOVNTPD);
{ a=b, b=a }
in the same block of instructions or something. I guess the branch-delay slot in some RISC architectures is another example.) – Peter Cordes