Compare and Swap on x86 - why is it a full barrier?

Question

As per this question's answer, it seems that LOCK CMPXCHG on x86 actually causes a full barrier. Presumably, this is what Unsafe.compareAndSwapInt() generates under the hood as well. I am struggling to see why that is the case: with MESI protocol, after you updated the cache line, could the CPU simply invalidate just that cache line on other cores, rather than draining ALL store/load buffers of the core which performed CAS? Seems rather wasteful to me...

With a full barrier, you would actually flush all your missed prediction changes, instead of one cache line, so wouldn't it be worse with the full barrier? But obviously I am missing sth here :) — Bober02
Compare-and-swap on Wikipedia covers this, It compares the contents of a memory location to a given value and, only if they are the same, modifies the contents of that memory location to a new given value. This is done as a single atomic operation. The atomicity guarantees that the new value is calculated based on up-to-date information; if the value had been updated by another thread in the meantime, the write would fail. Without a full barrier it might be interrupted (or otherwise updated) and that could invalidate atomicitiy. — Elliott Frisch

Eugene Eugene · Accepted Answer · 2017-07-18T09:01:11

Your answer as far as I can see is in the comments - MESI updates caches, not Store/Load buffers. But lock LOCK CMPXCHG says: locked operations serialize all outstanding load and store operation - this is why it needs to drain the Store/Load buffer from this CPU (and not others as detailed here).

So the current CPU has to perform the atomic operation on the most recent value - that could reside in Store/Load buffers, that's why a fence is needed there to actually drain that.

Compare and Swap on x86 - why is it a full barrier?

1 Answers