3
votes

As we know on x86 architecture the acquire-release consistency provided automatically - i.e. all operations automatically ordered without any fences, exclude first store and next load operations. (As said Herb Sutter on page 34: https://onedrive.live.com/view.aspx?resid=4E86B0CF20EF15AD!24884&app=WordPdf&authkey=!AMtj_EflYn2507c )

If we put MFENCE(LFENCE+SFENCE) between them, then store can't be reordered, and load can't be reordered - i.e. we provided sequential consistency.

But if we marked memory as WC(Write Combined), then do we have any consistency automatically without any fences, may be acquire-release?

Or if we use SSE instructions with WC-memory, then we have not any consistency, and if we use simple MOV instructions with WC-memory, then we have acquire-release consistency, isn't it?

2

2 Answers

1
votes

As stated here: How MTRR registers implemented?

Stores to WC Memory: The WC memory type is well-suited to an area of memory (e.g., the video frame buffer) that has the following characteristics: 1. The processor does not cache from WC memory. 2. Speculative execution of loads from WC memory is permitted. 3. Stores to WC memory are deposited in the processor's Write Combining Buffers (WCBs). 4. Each WCB can hold one line (64 bytes of data). 5. As stores are performed to a line of WC memory space, the bytes are accumulated in the WCB assigned to record writes to that line of memory space. 6. A subsequent store to a location in a WCB can overwrite a byte that was deposited in that location by an earlier store to that location. In other words, multiple writes to the same location are collapsed so that the location reflects the last data byte written to that location. 7. When the WCBs are ultimately dumped to external memory over the FSB, data is not necessarily written to memory in the same order in which the earlier programmatic stores were executed. The device being written to must tolerate this type of behavior (i.e., it must function correctly). See "WCB FSB Transactions" on page 1080 for more information.

I believe there is no "automatic consistency" for WC memory since the final writes to memory are "not necessarily written to memory in the same order in which the earlier programmatic stores were executed".

0
votes

This is a bad idea, WC memory is very slow to read (20x slower) and requires using special SSE/AVX2 instructions to speed it up. Using MFENCE is significantly faster.

Coherency is not guaranteed as well.