2
votes

I'm doing something about function safety. I need verify some X86 CPU instructions, such as LFENCE, SFENCE and MFENCE.

Now I can experience MFENCE according to Intel SDM chapter 8.2.3.4 "loads may be reordered with earlier store to different location".

"xor %0, %0\n\t                 "
"movl $1, %1\n\t                "
"mfence\n\t                     "   
"movl %2, %0\n\t                "
: "=r"(r1), "=m" (X)             
: "m"(Y)                         
: "memory"); 
"xor %0, %0\n\t                 "
"movl $1, %1\n\t                "
"mfence\n\t                     "   
"movl %2, %0\n\t                "
: "=r"(r2), "=m" (Y)
: "m"(X)
: "memory");

Above code only experience MFENCE could prevent memory reordering.(by detect the different value of r1 and r2 before/after removing mfence in both processors)

So I'm wondering how can I verify LFENCE and SFENCE like above. I didn't find any logic in SDM.

1
Can you clarify how the code you've shown verifies the documented behavior of mfence? You actually need to write many tests to check every property of all of the three fence instructions for Intel and AMD processors, which is going to take a lot of effort.Hadi Brais
@HadiBrais: this code appears to reproduce the test from preshing.com/20120515/memory-reordering-caught-in-the-act. Where StoreLoad reordering on normal WB memory is visible on x86. It's pretty clear that's all they're trying to test.Peter Cordes
Thanks Peter for the comments. The link exactly explained Hadi's question. @HadiBrais If you want you can clone my test code from github.com/ysun/acrn-unit-test.git with branch 'memory_ordering'Yi Sun

1 Answers

3
votes

Related: Does the Intel Memory Model make SFENCE and LFENCE redundant?

sfence has no real effect unless you're using NT stores1. If you NT-store data and then a pointer to that data (or a "ready" flag), a reader can see the old value for the data even if they see the new pointer / flag value. sfence can be used to ensure that the two stores become observable in program order.

lfence is useless for memory ordering unless you're doing NT loads from a WC memory region (like video RAM). You'll have a very hard time creating a case where commenting it out creates a detectable different in memory ordering.

The main use for lfence is to serialize execution, not memory. See Understanding the impact of lfence on a loop with two long dependency chains, for increasing lengths


Since you asked about C not just asm, there's a related answer about when you should use _mm_sfence() and other intrinsics. When should I use _mm_sfence _mm_lfence and _mm_mfence (usually you really only need asm("" ::: "memory"); unless NT stores are in flight, because blocking compile-time reordering gives you acq / rel ordering without any runtime barrier instructions.)


Footnote 1: That's true for normal WB (WriteBack) memory cacheability settings. In user-space under a normal OS, that's what you always have unless you did something very special.

For other memory types (MTRR or PAT settings): NT stores on uncacheable memory have no special effect, and are still strongly ordered. NT stores on WC, WB, or WT memory (or normal stores to WC memory) are weakly ordered and make it useful to use sfence before storing a buffer_ready flag for another thread.

SSE4.1 movntdqa loads from WB memory are not weakly ordered. Unlike stores, it doesn't override the memory type's ordering semantics. On current CPUs, nothing special happens at all on WB memory; they're just a less-efficient movdqa laod. Only use them on WC memory.