Yes, two cores can take two different locks at the same time. The atomic RMW operation only needs a "cache lock", not a global bus lock, on modern CPUs. e.g. this test code (on Godbolt) is C++ code that compiles to a loop that just repeats an xchg [rdi], ecx
, with each thread using a different std::atomic<int>
object in a different cache line. The total runtime of the program on my i7-6700k is 463ms whether it runs on 1 or 4 threads, so that rules out any kind of system-wide bus lock, confirming that the CPU just uses a MESI cache-lock within the core doing the RMW to make sure it's atomic without disturbing operations of other cores. Uncontended locks scale perfectly when each thread is only locking/unlocking its own lock repeatedly.
Taking a lock that was last released by another core will stall this one for maybe hundreds of clock cycles (40 to 70 nanoseconds is a typical inter-core latency) for the RFO (Read For Ownership) to complete and get exclusive ownership of the cache line, but won't have to retry or anything. Atomic RMW involves a memory barrier (on x86), so memory operations after the lock can't even get started, so the CPU core may be stalled for a while. There is significant cost here, compared to normal loads/stores, which out-of-order exec can't hide as well as some other things.
No, two cores can't take the same lock at the same time1, that's the whole point of a mutex. Correctly-implemented ones don't have the same bug as your example of spin-wait and then separately store a true
.
(Note 1: There are counted locks / semaphores that you can use to allow up to n
threads into a critical section, for some fixed n
, where the resource management problem you want to solve is something other than simple mutual exclusion. But you're only talking about mutexes.)
The critical operation in taking a lock is an atomic RMW, for example x86 xchg [rcx], eax
or lock cmpxchg [rcx], edx
, that stores a 1
(true) and as part of the same operation checks what the old value was. (Can num++ be atomic for 'int num'?). In C++, that would mean using std::atomic<bool> lock;
/ old = lock.exchange(true);
In C#, you have Interlocked.Exchange(). That closes the race window your attempt contained, where two threads could exit the while(_locker){}
loop and then both blindly store a _locker = true
.
Also note that rolling your own spin-loop has problems if you don't use volatile
or Volatile.Read()
to stop the compiler from assuming that no other threads are writing a variable you're reading/writing. (Without volatile, while(foo){}
can optimize into if(!foo) infinite_loop{}
by hoisting the apparently loop-invariant load out of the loop).
(The other interesting part of implementing a lock is what to do if it's not available the first time you try. e.g. how long you keep spinning (and if so exactly how, e.g. the x86 pause
instruction between read-only checks), using CPU time while waiting, before falling back to making a system call to give up the CPU to another thread or process, and have the OS wake you back up when the lock is or might be available again. But that's all performance tuning; actually taking the lock revolves around an atomic RMW.)
Of course, if you're going to do any rolling-your-own, make the increment itself a lock-free atomic RMW with Interlocked.Increment(ref counter);
, as per the example in MS's docs
Does every object contain a "is-being-used" flag and a "raw" object is just being used for containing this flag ? Why not Boolean ?
We know from object sizes that C# doesn't do that. Probably you should just use lock (counter){ counter++; }
instead of inventing a separate. Using a dummy object would make sense if you didn't have an existing object you wanted to manage, but instead some more abstract resource like calling into some function. (Correct me if I'm wrong, I don't use C#; I'm just here for the cpu-architecture and assembly tags. Does lock()
require an object, not a primitive type like int
?)
I'd guess that they instead do what normal C++ implementations of std::atomic<T>
does for objects too large to be lock-free: a hash table of actual mutexes or spinlocks, indexed by C# object address. Where is the lock for a std::atomic?
Even if that guess isn't exactly what C# does, that's the kind of mental model that can make sense of this ability to lock anything without using reserving space in every object.
This can create extra contention (by using the same mutex for two different objects). It could even introduce deadlocks where there shouldn't have been any, which is something the implementation would have to work around. Perhaps by putting the identity of the object being locked into the mutex, so another thread that indexes the same mutex can see that it's actually being used to lock a different object, and then do something about it... This is perhaps where being a "managed" language comes in; Java apparently does the same thing where you can lock any object without having to define a separate lock.
(C++ std::atomic doesn't have this problem because the mutexes are taken/released inside library functions, with no possibility to try to take two locks at the same time.)
Do CPU cores tick at the same time?
Not necessarily, e.g. Intel "server" chips (most Xeons) let each core control its frequency-multiplier independently. However, even in a multi-socket system, the clock for all cores is normally still derived from the same source, so they can keep their TSC (which counts reference cycles, not core clocks) synced across cores.
Intel "client" chips, like desktop/laptop chips such as i7-6700, actually do use the same clock for all cores. A core is either in low-power sleep (clock halted) or running at the same clock frequency as any other active cores.
None of this has anything to do with locking, or making atomic RMW operations truly atomic, and probably should be split off to a separate Q&A. I'm sure there are plenty of non-x86 examples, but I happen to know how Intel CPUs do things.
objects
contain whatever internal runtime state is required to implement this feature, you don't need to know what that is. Why object? Because that's how Java did it. – Jeremy Lakemanlock()
is guaranteed to be atomic (can't be interrupted between checking if it's locked and setting the lock state), but your while loop isn't. Feedback for your question, don't ask multiple questions in a post. – Sweeper