6
votes

I use C++ since a long time, and now I'm starting to learn assembly and learn how processors work (not just for fun, but I have to as a part of a test program). While learning assembly, I started hearing some of the terms that I hear here and there when discussing multithreading, given that I do lots of multithreading in scientific computing. I'm struggling to get the full picture, and I'd appreciate helping me to widen my picture.

I learned that a bus, in its simplest form, is something like a multiplexer followed by a demultiplexer. Each of the ends takes an address as input, in order to connect the two ends with some external component. The two ends can, based on the address, point to memory, graphics card, RAM, CPU registers, or anything else.

Now getting to my question: I keep hearing people arguing on whether to use a mutex or an atomic for thread safety (I know there's no ultimate answer, this is not what my question is, but my question is about the comparison). Here for example, the claim was made that atomics are so bad that they will prevent a processor from doing a decent job because of bus-locking.

Could someone please explain what bus-locking is, in a little detail, and why it is not like mutexes, while AFAIK, mutexes need at least two atomic operations to lock and unlock.

3
Bus-locking happens for atomic read-modify-write operations only. A mutex needs to perform an RMW, too, in order to acquire the lock, so it doesn't prevent bus locking. Bus locking refers to the locking of the memory bus, so no other process can access memory (perhaps only at the location or cache line in question) while the bus is locked. In x86 it's effected by the LOCK prefix (which is implied for a memory-XCHG).Kerrek SB
(Other architectures may not be able to lock the bus and perform RMW operations in a loop.)Kerrek SB
@kerrek I don't think that information is up-to-date. The bus (which doesn't even exist any more on modern Intel CPUs) is only used if cheaper cache locks, etc aren't possible (writes straddling cache lines etc). It should be an exceedingly rare event.Voo

3 Answers

6
votes

From Intel® 64 and IA-32 Architectures Software Developer’s Manual:

Beginning with the P6 family processors, when the LOCK prefix is prefixed to an instruction and the memory area being accessed is cached internally in the processor, the LOCK# signal is generally not asserted. Instead, only the processor’s cache is locked. Here, the processor’s cache coherency mechanism ensures that the operation is carried out atomically with regards to memory.

There are special non-temporal store instructions to bypass the cache. All other loads and stores normally go through the cache, unless the memory page is marked as non-cacheable (like GPU or PCIe device memory).

5
votes

"I learned that a bus, in its simplest form, is something like a multiplexer followed by a demultiplexer. Each of the ends"

Well, that's not correct. In its simplest form there's nothing to multiplex or demultiplex. It's just two things talking directly to each other. And in the nost-so simple case, a bus may have three or more devices connected. In that case, you start needing bus addresses because you no longer can talk about "the other end".

Now if you've got multiple devices on a single bus, they generally can't all talk at the same time. There must be some mechanism to prevent them from all talking at the same time. Yet for all devices to be able to share that bus, they must be able to alternate who is talking to who. Bus locking as a broad term means any deviation from the usual pattern, where two devices reserve the bus for their mutual conversation.

In the particular context of the x86 memory bus, this means keeping the bus locked during a read-modify-write cycle (as Kerrek SB pointed out in comments). Now this may sound like a simple bus with 2 devices (memory and CPU) but DMA and multi-core chips make this not that simple.

2
votes

Buslocks are required when more than 1 resource is used to gain access. Usually locks that don't span a cache line and are of cacheable memory type would not require buslocks. The core will just require the line to be exclusive and could send nacks when other cores try to access the resource.

Non cacheable memory types would require buslocks and so would a misaligned lock that spans a cache line and any other transaction that requires multiple resources.

If not all resources are acquired the processes could deadlock. This would happen when multiple processes are able to grab onto resources such that no one process has all the resources to make forward progress