Is synchronizing with `std::mutex` slower than with `std::atomic(memory_order_seq_cst)`?

Question

The main reason for using atomics over mutexes, is that mutexes are expensive but with the default memory model for atomics being memory_order_seq_cst, isn't this just as expensive?

Question: Can concurrent a program using locks be as fast as concurrent lock-free program?

If so, it may not be worth the effort unless I want to use memory_order_acq_rel for atomics.

Edit: I may be missing something but lock-based cant be faster than lock-free because each lock will have to be a full memory barrier too. But with lock-free, it's possible to use techniques that are less restrictive then memory barriers.

So back to my question, is lock-free any faster than lock based in new C++11 standard with default memory_model?

Is "lock-free >= lock-based when measured in performance" true? Let's assume 2 hardware threads.

Edit 2: My question is not about progress guarantees, and maybe I'm using "lock-free" out of context.

Basically when you have 2 threads with shared memory, and the only guarantee you need is that if one thread is writing then the other thread can't read or write, my assumption is that a simple atomic compare_and_swap operation would be much faster than locking a mutex.

Because if one thread never even touches the shared memory, you will end up locking and unlocking over and over for no reason but with atomic operations you only use 1 CPU cycle each time.

In regards to the comments, a spin-lock vs a mutex-lock is very different when there is very little contention.

Well, there's different progress guarantees between locks, lock-free , and wait-free code. — Ben Voigt
atomic operations (whether they are used for lock free stuff or to implement locks) take MUCH longer than "1 CPU cycle". used to be in the triple-digits (!) on Pentium 4, now it's down to low double digits -- but still much more than 1. — Paul Groke
@curiousguy Ah. I think I understand now - at least somewhat. In that case the answer is "cached". I was correcting the OP's claim, and for that assuming optimal conditions for his claim, i.e. the fastest atomic CAS that you can get. Which on prescott was about 100 cycles and on more modern x86 CPUs is in the 15-20 cycle range. ps: I still find the use of the word "hypothesis" confusing. I would have used "conditions". — Paul Groke

Kerrek SB Kerrek SB · Accepted Answer · 2013-04-30T20:32:13

Lockfree programming is about progress guarantees: From strongest to weakest, those are wait-free, lock-free, obstruction-free, and blocking.

A guarantee is expensive and comes at a price. The more guarantees you want, the more you pay. Generally, a blocking algorithm or datastructure (with a mutex, say) has the greatest liberties, and thus is potentially the fastest. A wait-free algorithm on the other extreme must use atomic operations at every step, which may be much slower.

Obtaining a lock is actually rather cheap, so you should never worry about that without a deep understanding of the subject. Moreover, blocking algorithms with mutexes are much easier to read, write and reason about. By contrast, even the simplest lock-free data structures are the result of long, focused research, each of them worth one or more PhDs.

In a nutshell, lock- or wait-free algorithms trade worst latency for mean latency and throughput. Everything is slower, but nothing is ever very slow. This is a very special characteristic that is only useful in very specific situations (like real-time systems).

Is synchronizing with `std::mutex` slower than with `std::atomic(memory_order_seq_cst)`?

3 Answers