I'm trying to wrap my head around the subtleties around memory barriers and volatile reads/writes. I am reading Joseph Albahari's threading article here:
http://www.albahari.com/threading/part4.aspx
And stumbling on the issue of when a memory barrier is needed before a read/write and when it is needed after. In this section of code in the section "full fences", he puts a memory barrier after each write and before each read:
class Foo
{
int _answer;
bool _complete;
void A()
{
_answer = 123;
Thread.MemoryBarrier(); // Barrier 1
_complete = true;
Thread.MemoryBarrier(); // Barrier 2
}
void B()
{
Thread.MemoryBarrier(); // Barrier 3
if (_complete)
{
Thread.MemoryBarrier(); // Barrier 4
Console.WriteLine (_answer);
}
}
}
He goes on to explain:
Barriers 1 and 4 prevent this example from writing “0”. Barriers 2 and 3 provide a freshness guarantee: they ensure that if B ran after A, reading _complete would evaluate to true.
Question #1: I have no problem with barriers 1 and 4, as it will prevent reordering across those barriers. I don't entirely understand why barriers 2 and 3 are necessary though. Can someone explain, especially considering how volatile reads and writes are implemented in the Thread
class (explained next)?
Now where I really start to get confused is that this is the actual implementation of Thread.VolatileRead/Write()
:
[MethodImplAttribute(MethodImplOptions.NoInlining)]
public static void VolatileWrite (ref int address, int value)
{
MemoryBarrier(); address = value;
}
[MethodImplAttribute(MethodImplOptions.NoInlining)]
public static int VolatileRead (ref int address)
{
int num = address; MemoryBarrier(); return num;
}
As you can see, in contrast to the previous example, the built-in volatile functions place memory barriers before each write (instead of after) and after each read (instead of before). So if we were to rewrite the previous example with an equivalent version based on the built-in volatile functions, it would look like this instead:
class Foo
{
int _answer;
bool _complete;
void A()
{
Thread.MemoryBarrier(); // Barrier 1
_answer = 123;
Thread.MemoryBarrier(); // Barrier 2
_complete = true;
}
void B()
{
if (_complete)
{
Thread.MemoryBarrier(); // Barrier 3
Console.WriteLine (_answer);
Thread.MemoryBarrier(); // Barrier 4
}
}
}
Question #2: Are both Foo classes functionally equivalent? Why or why not? If barriers 2 and 3 (in the first Foo class) are needed to guarantee the value is written and the actual value is read, then wouldn't the Thread.VolatileXXX
methods be kind of useless?
There are a couple similar questions on StackOverflow with accepted answers like "barrier 2 ensures the write to _complete isn't cached", but none of them address why Thread.VolatileWrite()
puts the memory barrier before the write if that's the case, and why barrier 3 is needed if Thread.VolatileRead()
put the memory barrier after the read yet guarantees an up to date value. I think that's what's throwing me off the most here.
UPDATE:
Okay so after more reading and thinking I have a theory and I updated the source code with attributes I think might be relevant. I don't think the memory barriers in the Thread.VolatileRead/Write
methods are there to ensure "freshness" of the values at all, but rather to enforce the reordering guarantees. Putting the barrier after reads and before writes ensures that no writes will be moved before any reads (but not vice versa).
From what I have been able to find, all writes on x86 guarantee cache coherence by invalidating their cache line on other cores so "freshness" is guaranteed as long as the value is not cached in a register. My theory on the way VolatileRead/Write
ensure the value is not in a register, which may be way off but I think I'm on the right track, is that they count on a .NET implementation detail where if they are marked MethodImplOptions.NoInlining
(which as you can see above they are) then the value will need to be passed to/from the method instead of being inlined as a local variable and thus will have to be accessed from memory/cache instead of directly through a register, thus eliminating the need for an additional memory barrier after the write and before the read. I have no idea if that's the case, but that's the only way I can see it working properly.
Can anyone confirm or deny that this is the case?