2
votes

The documentation for sfence says:

Performs a serializing operation on all store-to-memory instructions that were issued prior the SFENCE instruction.

What does "serializing operation" mean?

Does it mean make sure all store-to-memory instructions that were issued prior to the sfence instruction are completed before continuing executing the instructions after sfence?

3
Have you read the rest of the Intel documentation? There ought to be some section that explains what a serializing operation is. - fuz
Yes, serializing instructions are also full memory barriers; they effectively flush the whole pipeline (Does lock xchg have the same behavior as mfence?), unlike lfence on Intel, which just serializes instruction execution, not the store buffer. Why is (or isn't?) SFENCE + LFENCE equivalent to MFENCE? - Peter Cordes
You are reading an old version of the doc. The new version removes the language above and replaces it with much clearer language that says, among other things The processor ensures that every store prior to SFENCE is globally visible before any store after SFENCE becomes globally visible. which is about a good one-sentence summary as you are going to get. - BeeOnRope

3 Answers

3
votes

sfence makes sure that all prior stores in program order become globally visible before any later stores in program order become globally visible. There are two differences compared to what you've written. First, sfence does not serialize issued prior stores; it serializes all prior stores irrespective of whether they have been issued or not. Second, it serializes with respect to only all later stores; not all later instructions. That's what is meant by "serializing operation" within the context of sfence.

You've quoted only the first sentence from the documentation, but every sentence matters.

3
votes

The English word Serial - adjective form:

  1. occurring in a series rather than simultaneously

  1. Computers.
    a) of or relating to the apparent or actual performance of data-processing operations one at a time (distinguished from parallel).

    b) of or relating to the transmission or processing of each part of a whole in sequence, as each bit of a byte or each byte of a computer word (distinguished from parallel).

(Serialization can also mean converting an object representation to a bit-stream or byte-stream which can be stored to disk or sent over a network outside of the program. But that's not the meaning that applies in the context of sfence).

Database https://en.wikipedia.org/wiki/Serializability is a more closely related concept.


SFENCE orders the global visibility of earlier stores with respect to SFENCE itself, and later stores. Serializing = imposing an order on things, stopping them from overlapping or happening in parallel.


Note that in Intel terminology, "serializing instruction" has a special meaning: an instruction that flushes the store buffer and the out-of-order instruction pipeline before any later instructions can execute. (They can decode and maybe even issue into the out-of-order core, but not execute). How many memory barriers instructions does an x86 CPU have?

sfence is not a "serializing instruction" in that sense; it only orders NT stores with respect to each other and regular stores. (Regular stores are already ordered with respect to each other, so sfence has no effect if there are no NT stores in flight. All you need for correct release semantics is to put regular stores in the right order, e.g. with a compiler barrier to stop compile-time reordering.)

"serializing" in Intel's definition of sfence is just the plain English meaning of the term, not the "serializing instruction" x86 special meaning.


Current wording of Intel's ISA ref manual entry for sfence:

Intel rewrote the opening paragraph to say "orders" instead of "serializes", except in the short description: Serializes store operations.

The main Description is:

Orders processor execution relative to all memory stores prior to the SFENCE instruction. The processor ensures that every store prior to SFENCE is globally visible before any store after SFENCE becomes globally visible. The SFENCE instruction is ordered with respect to memory stores, other SFENCE instructions, MFENCE instructions, and any serializing instructions (such as the CPUID instruction). It is not ordered with respect to memory loads or the LFENCE instruction.

The first sentence is still kind of bogus, though. Execution isn't ordered, only commit to L1d cache.

3
votes

An sfence prevents stores before the fence from being re-ordered with respect to stores after the fence. That's it. Don't focus on the "serializing" part: Intel has removed the text you quoted from the current version of the manual (you linked an obsolete source).

The new text says1 (emphasis mine):

Orders processor execution relative to all memory stores prior to the SFENCE instruction. The processor ensures that every store prior to SFENCE is globally visible before any store after SFENCE becomes globally visible. The SFENCE instruction is ordered with respect to memory stores, other SFENCE instructions, MFENCE instructions, and any serializing instructions (such as the CPUID instruction). It is not ordered with respect to memory loads or the LFENCE instruction.

Weakly ordered memory types can be used to achieve higher processor performance through such techniques as out-of-order issue, write-combining, and write-collapsing. The degree to which a consumer of data recognizes or knows that the data is weakly ordered varies among applications and may be unknown to the producer of this data. The SFENCE instruction provides a performance-efficient way of ensuring store ordering between routines that produce weakly-ordered results and routines that consume this data.

The second (emphasized) line is the key: this guys is there to orders stores.

It doesn't (necessarily) make stores become visible sooner - that happens naturally on a coherent architecture like x86. It doesn't necessarily serialize instructions surrounding the fence, including stores: it just makes sure stores aren't apparently reordered across the barrier.

Here's a secret though: this instruction is mostly useless in x86 code. The x86 memory model already guarantees that normal stores are already exactly ordered with respect to each other: stores from a given CPU become visible in program order to all other CPUs, so sfence doesn't add anything. The only exceptions, where sfence can be useful is with relatively obscure stuff like non-temporal stores or really obscure stuff like WC memory types. If you aren't using that, you don't need this instruction.


1 I've also linked an unofficial source as there is no official HTML source that I'm aware of - but I checked that it is up-to-date on sfence as of May 2018.