9
votes

Suppose that I have a class that looks like this (actually exactly this size):

class K
{
public:

    long long get_x() const; // lock m_mutex in shared/read-only mode

    void update( long long w ); // lock m_mutex with a unique_lock

private:
    long long m_a;
    long long m_b;
    long long m_c;
    long long m_x;
    double m_flow_factor;

    mutable boost::shared_mutex m_mutex;
};

As you can see, this should be thread-safe. The update function is called by one thread at a time, unknown but only one thread (guaranteed), but the accessor can be called by several threads at the same time.

The update function is changing all the values and is called very often (a hundread times a second). Current implementation will, as you can guess, locks a lot.

I was considering using std::atomic to avoid locks and potentially make this code more efficient. However, I really need the update function to update the values together. Therefore, I am considering doing something like this instead:

class K
{
public:

    long long get_x() const
    { return data.load().x; }

    void update( long long w )
    {
        auto data_now = data.load();
        // ... work with data_now
        data.store( data_now );
    }

private:
    struct Data {
    long long a;
    long long b;
    long long c;
    long long x;
    double flow_factor;
    };
    std::atomic<Data> data;
};

My current understanding of std::atomic is that, even if this code is more readable than the previous one (because it don't have lock declarations everywhere), as the K::Data struct is "big", std::atomic will just be implemented with a normal mutex lock (so it shouldn't be faster than my initial implementation anyway).

Am I correct?

3

3 Answers

13
votes

Any specialization for std:atomic for a struct like that is going to involve internal locking, so you've gained nothing, and now you also have a data race between the load and store you didn't have before, as this had exclusive locking around the whole block (i presume?) in the previous version.

Also with the shared_mutex, it might be wise to profile with a normal mutex vs shared_mutex, you may find the normal mutex performs better (all depends on how long you're holding your locks for).

The benefit of the shared_mutex is only seen when locks are being held for reading for an extended period of time and there are very few writes, otherwise the overhead involved in the shared_mutex kills any gains you would have over the normal mutex.

1
votes

std::atomic is not necessarily slower than std::mutex. For example in MSVC 14.0, implementation of std::atomic.store looks like this:

inline void _Atomic_copy(
volatile _Atomic_flag_t *_Flag, size_t _Size,
    volatile void *_Tgt, volatile const void *_Src,
        memory_order _Order)
{   /* atomically copy *_Src to *_Tgt with memory ordering */
_Lock_spin_lock(_Flag);
_CSTD memcpy((void *)_Tgt, (void *)_Src, _Size);
_Unlock_spin_lock(_Flag);
}

inline void _Lock_spin_lock(
volatile _Atomic_flag_t *_Flag)
{   /* spin until _Flag successfully set */
while (_ATOMIC_FLAG_TEST_AND_SET(_Flag, memory_order_acquire))
    _YIELD_PROCESSOR;
}

It is not guaranteed that spin lock would be faster than a proper std::mutex. It depends on what exactly you are doing. But std::atomic is for sure NOT ALWAYS a sub-optimal solution compared to std::mutex.

1
votes

Regardless of whether implemented with a lock, your default usage of atomic (i.e. with sequential consistency) is quite slow, and likely just as slow if not slower than the mutex. Adjusting the memory barriers in update() might lead to better performance:

void update( long long w )
{
    auto data_now = data.load(std::memory_order_acquire);
    // ... work with data_now
    data.store(data_now, std::memory_order_release);
}

Also, if you don't need sequential consistency between your threads (you don't synchronize writer with readers), you can further optimize the reader:

long long get_x() const
{ return data.load(std::memory_order_relaxed).x; }