15
votes

It seems like the accepted definition of acquire and release semantics is something like this: (Quoted from http://msdn.microsoft.com/en-us/library/windows/hardware/ff540496(v=vs.85).aspx)

An operation has acquire semantics if other processors will always see its effect before any subsequent operation's effect. An operation has release semantics if other processors will see every preceding operation's effect before the effect of the operation itself.

I have briefly read about existence of half memory barriers and supposedly they come in flavor of acquire barriers and release barriers following the same semantics described above.

Looking up real example of hardware instructions I came across SFENCE. And this blog (http://peeterjoot.wordpress.com/2009/12/04/intel-memory-ordering-fence-instructions-and-atomic-operations/) says that it is a form of release fence/barrier:

Intel provides a bidirectional fence instruction MFENCE, an acquire fence LFENCE, and a release fence SFENCE.

However reading the definition of SFENCE, it doesn't seem to provide release semantics in that it doesn't synchronize with loads at all? Whereas release semantics as I understand defines ordering with respect to all memory operations (loads & stores).

1

1 Answers

20
votes

LFENCE does not have acquire semantics; SFENCE does not have release semantics. There's a good reason for that: Having a stand-alone fence instruction with acquire semantics, or release semantics, turns out to be almost completely useless. For an acquire/release to do any good, it must be tied to a memory operation.

For example, consider the common idiom for sending data between two threads:

  1. Processor A writes into a buffer.
  2. Processor A writes "true" into a flag.
  3. Processor B waits until the flag is true.
  4. Processor B reads the buffer.

Note that processor A must ensure that its write to the flag is seen after it writes to the buffer. Now suppose we had a "RFENCE" instruction that is a release fence. If we put the instruction immediately after step (1), it does no good, because the write in step 2 is allowed to appear to migrate up over RFENCE and up over step 1.

A similar argument shows that a "AFENCE" instruction that does an acquire is equally useless for ensuring that the read of the flag in step 3 does not appear to migrate downwards across step 4.

Itanium solved the problem elegantly by providing write-with-release and load-with-acquire instructions that tie the fence to a memory operation.

Back to IA-32 and Intel64: If a program does not use "non-temporal" instructions, then the remaining instructions behave as if every load does an "acquire" and every store does a "release". See Section 8.2.3 (and subsections) of Intel® 64 and IA-32 Architectures Developer's Manual: Vol. 3A. If there are "non-temporal" stores involved, you have several ways to enforce a fence:

  • Use SFENCE
  • Use MFENCE - somewhat overkill
  • Use a LOCK-prefixed instruction (such as "LOCK INC") to write the flag. LOCK-prefixed instructions implicitly have MFENCEs.
  • Use XCHG, which acts as if it has an implicit LOCK prefix, to write the flag.

For example, if in the earlier idiom, the buffer is written using non-temporary stores, have processor A issue a SFENCE or MFENCE between steps 1 and 2. Or use XCHG to write the flag.

All of the above remarks apply to the hardware. When using a high-level language, be sure that the compiler does not damage the critical ordering of events. The C++11 atomic operations library exists so that you can tell the compiler and hardware what you intend.