LOCK CMPXCHG on non-cached memory?

Question

A simple question: is a LOCK CMPXCHG possible on non-cached memory, ie pages marked in the page-table as non-cached?

Why not? I don't recall anything in the manuals saying otherwise. If anything it's the cacheable memory that is treated specially. — Margaret Bloom
It's easier with cache-coherency I think. With cache-coherency, the core simply can retain the cacheline in cache until the operation has finished. Without, I can't imagine a more realistic way than locking the "bus". — Bonita Montero
Yes, that's how it was implemented pre-QPI. Now, there is a "virtual" analogy of the #lock signal, where an agent acts as the Quiesce master and instructs the other to not start any further transaction. See the lock section of this very good article. — Margaret Bloom

Margaret Bloom Margaret Bloom · Accepted Answer · 2017-05-07T19:26:32

The content of this answer closely resembles the content of this Dr Dobbs' article, particularly the "Locking" section, which I consulted to understand the locking on a QuickPath Interconnect (QPI) enabled systems.
As such this post has been marked as a "community wiki".

Yes, it's possible.

The 8086 had no cache but was able to perform atomic operations.
This was accomplished thanks to the introduction of the #lock signal in the FSB. When this signal was asserted, no new transaction could be started by any agent—only the locking one could be executed (actually, not even the locking one sometimes)—thereby quiescing the system.

With the introduction of caching, the need for a bus lock was reduced. The processor can operate its cache by delaying any snooping request from other agents for the duration of the lock.
However, the legacy bus lock was preserved due to backwards compatibility and because the guarded variable could span two cache rows.

When the FSB was dropped in favour of QPI (think of the abandonment of the hub architecture and of multi-socket systems), the #lock signal was dropped, too.

Now, one of the QPI agents is designed as a Quiesce Master (QM). When a processor wants a lock, it asks the QM, which in turn informs the other agents—including DMA agents—to stop any future request.
When every agent has acknowledged to the QM, it informs the lock requester that the system is locked. The atomic operation is then carried out, and upon completion, an unlock requested is presented to the QM. Finally, the QM will proceed with informing the other agents that new transactions are allowed again.
In this way, the mechanisms for locking the entire memory subsystem are still present and functional in modern designs.

LOCK CMPXCHG on non-cached memory?

1 Answers