It looks like you are basically asking how the lock works. How can the lock maintain internal state in an atomic manner without already having the lock built? It seems like a chicken and egg problem at first does it not?
The magic all happens because of a compare-and-swap (CAS) operation. The CAS operation is a hardware level instruction that does 2 important things.
- It generates a memory barrier so that instruction reordering is constrained.
- It compares the contents of a memory address with another value and if they are equal then the original value is replaced with a new value. It does all of this in an atomic manner.
At the most fundamental level this is how the trick is accomplished. It is not that all other threads are blocked from reading while another is writing. That is totally the wrong way to think about it. What actually happens is that all threads are acting as writers simultaneously. The strategy is more optimistic than it is pessimistic. Every thread is trying to acquire the lock by performing this special kind of write called a CAS. You actually have access to a CAS operation in .NET via the Interlocked.CompareExchange
(ICX) method. Every synchronization primitive can be built from this single operation.
If I were going to write a Monitor
-like class (that is what the lock
keyword uses behind the scenes) from scratch entirely in C# I could do it using the Interlocked.CompareExchange
method. Here is an overly simplified implementation. Please keep in mind that this is most certainly not how the .NET Framework does it.1 The reason I present the code below is to show you how it could be done in pure C# code without the need for CLR magic behind the scenes and because it might get you thinking about how Microsoft could have implemented it.
public class SimpleMonitor
{
private int m_LockState = 0;
public void Enter()
{
int iterations = 0;
while (!TryEnter())
{
if (iterations < 10) Thread.SpinWait(4 << iterations);
else if (iterations % 20 == 0) Thread.Sleep(1);
else if (iterations % 5 == 0) Thread.Sleep(0);
else Thread.Yield();
iterations++;
}
}
public void Exit()
{
if (!TryExit())
{
throw new SynchronizationLockException();
}
}
public bool TryEnter()
{
return Interlocked.CompareExchange(ref m_LockState, 1, 0) == 0;
}
public bool TryExit()
{
return Interlocked.CompareExchange(ref m_LockState, 0, 1) == 1;
}
}
This implementation demonstrates a couple of important things.
- It shows how the ICX operation is used to atomically read and write the lock state.
- It shows how the waiting might occur.
Notice how I used Thread.SpinWait
, Thread.Sleep(0)
, Thread.Sleep(1)
and Thread.Yield
while the lock is waiting to be acquired. The waiting strategy is overly simplified, but it does approximate a real life algorithm implemented in the BCL already. I intentionally kept the code simple in the Enter
method above to make it easier to spot the crucial bits. This is not how I would have normally implemented this, but I am hoping it does drive home the salient points.
Also note that my SimpleMonitor
above has a lot of problems. Here are but only a few.
- It does not handle nested locking.
- It does not provide
Wait
or Pulse
methods like the real Monitor
class. They are really hard to do right.
1The CLR will actually use a special block of memory that exists on each reference type. This block of memory is referred to as the "sync block". The Monitor
will manipulate bits in this block of memory to acquire and release the lock. This action may require a kernel event object. You can read more about it on Joe Duffy's blog.