Volatile Reads / Writes and Thread.MemoryBarrier Ordering

Question

I'm trying to wrap my head around the subtleties around memory barriers and volatile reads/writes. I am reading Joseph Albahari's threading article here:

http://www.albahari.com/threading/part4.aspx

And stumbling on the issue of when a memory barrier is needed before a read/write and when it is needed after. In this section of code in the section "full fences", he puts a memory barrier after each write and before each read:

class Foo
{
  int _answer;
  bool _complete;

  void A()
  {
    _answer = 123;
    Thread.MemoryBarrier();    // Barrier 1
    _complete = true;
    Thread.MemoryBarrier();    // Barrier 2
  }

  void B()
  {
    Thread.MemoryBarrier();    // Barrier 3
    if (_complete)
    {
      Thread.MemoryBarrier();       // Barrier 4
      Console.WriteLine (_answer);
    }
  }
}

He goes on to explain:

Barriers 1 and 4 prevent this example from writing “0”. Barriers 2 and 3 provide a freshness guarantee: they ensure that if B ran after A, reading _complete would evaluate to true.

Question #1: I have no problem with barriers 1 and 4, as it will prevent reordering across those barriers. I don't entirely understand why barriers 2 and 3 are necessary though. Can someone explain, especially considering how volatile reads and writes are implemented in the Thread class (explained next)?

Now where I really start to get confused is that this is the actual implementation of Thread.VolatileRead/Write():

[MethodImplAttribute(MethodImplOptions.NoInlining)]
public static void VolatileWrite (ref int address, int value)
{
  MemoryBarrier(); address = value;
}

[MethodImplAttribute(MethodImplOptions.NoInlining)]
public static int VolatileRead (ref int address)
{
  int num = address; MemoryBarrier(); return num;
}

As you can see, in contrast to the previous example, the built-in volatile functions place memory barriers before each write (instead of after) and after each read (instead of before). So if we were to rewrite the previous example with an equivalent version based on the built-in volatile functions, it would look like this instead:

class Foo
{
  int _answer;
  bool _complete;

  void A()
  {
    Thread.MemoryBarrier();    // Barrier 1
    _answer = 123;
    Thread.MemoryBarrier();    // Barrier 2
    _complete = true;
  }

  void B()
  {
    if (_complete)
    {
      Thread.MemoryBarrier();    // Barrier 3
      Console.WriteLine (_answer);
      Thread.MemoryBarrier();       // Barrier 4
    }
  }
}

Question #2: Are both Foo classes functionally equivalent? Why or why not? If barriers 2 and 3 (in the first Foo class) are needed to guarantee the value is written and the actual value is read, then wouldn't the Thread.VolatileXXX methods be kind of useless?

There are a couple similar questions on StackOverflow with accepted answers like "barrier 2 ensures the write to _complete isn't cached", but none of them address why Thread.VolatileWrite() puts the memory barrier before the write if that's the case, and why barrier 3 is needed if Thread.VolatileRead() put the memory barrier after the read yet guarantees an up to date value. I think that's what's throwing me off the most here.

UPDATE:

Okay so after more reading and thinking I have a theory and I updated the source code with attributes I think might be relevant. I don't think the memory barriers in the Thread.VolatileRead/Write methods are there to ensure "freshness" of the values at all, but rather to enforce the reordering guarantees. Putting the barrier after reads and before writes ensures that no writes will be moved before any reads (but not vice versa).

From what I have been able to find, all writes on x86 guarantee cache coherence by invalidating their cache line on other cores so "freshness" is guaranteed as long as the value is not cached in a register. My theory on the way VolatileRead/Write ensure the value is not in a register, which may be way off but I think I'm on the right track, is that they count on a .NET implementation detail where if they are marked MethodImplOptions.NoInlining (which as you can see above they are) then the value will need to be passed to/from the method instead of being inlined as a local variable and thus will have to be accessed from memory/cache instead of directly through a register, thus eliminating the need for an additional memory barrier after the write and before the read. I have no idea if that's the case, but that's the only way I can see it working properly.

Can anyone confirm or deny that this is the case?

Microsoft screwed this up pretty badly. They fixed it with the Volatile class, read the comment. — Hans Passant
@HansPassant As far as I can tell, the Volatile methods are faster with less ordering guarantees but otherwise the Thread methods still guarantee fresh reads and such. I updated my answer with my theory after some more digging - any additional comment about that? I'm trying to understand how the volatile read works with the memory barrier AFTER the read. — Mike Marynowski

acelent acelent · Accepted Answer · 2017-03-16T18:35:06

I don't think the memory barriers in the Thread.VolatileRead/Write methods are there to ensure "freshness" of the values at all, but rather to enforce the reordering guarantees.

That's right.

Putting the barrier after reads and before writes ensures that no writes will be moved before any reads (but not vice versa).

A full memory barrier has both acquire and release semantics, it prevents both previous memory accesses from being reordered to after the barrier and succeeding memory accesses from being reordered to before the barrier.

Can anyone confirm or deny that this is the case?

You may be right about writes in Microsoft's .NET implementation in x86, but the same does not apply to reads. The read could be reordered between previous accesses and the memory barrier, either by the JIT compiler (perhaps not with the no-inlining attribute) or by the CPU (even with the no-inlining attribute).

However, this shouldn't change what the running code sees, although the read might not see the freshest value.

int value = 0;
bool done = false;

// in thread 1
value = 123;
Thread.VolatileWrite(ref done, true);

// in thread 2
Thread.SpinUntil(() => Thread.VolatileRead(ref done));
Console.WriteLine(value); // guaranteed 123 due to the memory barrier

In other architectures with weaker memory models, the write can be reordered to after the succeeding memory accesses, and in the extreme case, it may not become visible to other threads until the next memory barrier. With loops, this isn't much of a problem, though.

Anyway, my advise is to just not use Thread.VolatileRead and Thread.VolatileWrite.

Reads and writes to volatile fields, and the Volatile.Read and Volatile.Write methods provide the right semantics.

Although the Volatile.Read and Volatile.Write methods are implemented in C# just like the Thread.VolatileRead and Thread.VolatileWrite, the CLR replaces the methods with native versions with actual volatile read/write semantics.

Volatile Reads / Writes and Thread.MemoryBarrier Ordering

1 Answers