11
votes

I'd like to minimize synchronization and write lock-free code when possible in a project of mine. When absolutely necessary I'd love to substitute light-weight spinlocks built from atomic operations for pthread and win32 mutex locks. My understanding is that these are system calls underneath and could cause a context switch (which may be unnecessary for very quick critical sections where simply spinning a few times would be preferable).

The atomic operations I'm referring to are well documented here: http://gcc.gnu.org/onlinedocs/gcc-4.4.1/gcc/Atomic-Builtins.html

Here is an example to illustrate what I'm talking about. Imagine a RB-tree with multiple readers and writers possible. RBTree::exists() is read-only and thread safe, RBTree::insert() would require exclusive access by a single writer (and no readers) to be safe. Some code:

class IntSetTest
{
private:
    unsigned short lock;
    RBTree<int>* myset;

public:
    // ...

    void add_number(int n)
    {
        // Aquire once locked==false (atomic)
        while (__sync_bool_compare_and_swap(&lock, 0, 0xffff) == false);

        // Perform a thread-unsafe operation on the set
        myset->insert(n);

        // Unlock (atomic)
        __sync_bool_compare_and_swap(&lock, 0xffff, 0);
    }

    bool check_number(int n)
    {
        // Increment once the lock is below 0xffff
        u16 savedlock = lock;
        while (savedlock == 0xffff || __sync_bool_compare_and_swap(&lock, savedlock, savedlock+1) == false)
            savedlock = lock;

        // Perform read-only operation    
        bool exists = tree->exists(n);

        // Decrement
        savedlock = lock;
        while (__sync_bool_compare_and_swap(&lock, savedlock, savedlock-1) == false)
            savedlock = lock;

        return exists;
    }
};

(lets assume it need not be exception-safe)

Is this code indeed thread-safe? Are there any pros/cons to this idea? Any advice? Is the use of spinlocks like this a bad idea if the threads are not truly concurrent?

Thanks in advance. ;)

2
The answer I gave in a similar question, stackoverflow.com/questions/1919135/… , will probably be relevant here.Aidan Cully
Your answer was definitely relevant to the issue of using spinlocks in general. They seem like such a good idea for smp machines in they typical case. Would the worst-case situation (a writer that stops running during the critical section) even out with the more likely case of two concurrent threads trying to insert at the same time? What about in a hybrid threading environment where user threads are mapped onto a number of kernel threads equal to the number of logical processors on the machine? The worst-case situation would be even less likely then; no?Thomas
I'm not sure the extent to which the number of kernel threads affects the likelihood of running into performance issues. It's possible that the writer-thread has just used up its time-slice between the entry and exit of the lock, which would lead to the problem case no matter how many kernel threads there are. On this point, I'll note that the RB-tree insert operation is O(log(n)), so the larger the tree, the more likely this problem is to occur. Also, a larger tree is more likely to cause page-faults during update, which would also make the problem case more likely. I'd avoid spinlocks here.Aidan Cully
As I mentioned under DeadMG's answer, the "lightweight" aspect of this could be important. No doubt a mutex has considerably more overhead than a 16bit value and some atomic operations. Would adding sched_yield() or a similar call in the "spin" improve things in your estimation?Thomas

2 Answers

4
votes

You need a volatile qualifier on lock, and I would also make it a sig_atomic_t. Without the volatile qualifier, this code:

    u16 savedlock = lock;
    while (savedlock == 0xffff || __sync_bool_compare_and_swap(&lock, savedlock, savedlock+1) == false)
        savedlock = lock;

may not re-read lock when updating savedlock in the body of the while-loop. Consider the case that lock is 0xffff. Then, savedlock will be 0xffff prior to checking the loop condition, so the while condition will short-circuit prior to calling __sync_bool_compare_and_swap. Since __sync_bool_compare_and_swap wasn't called, the compiler doesn't encounter a memory barrier, so it might reasonably assume that the value of lock hasn't changed underneath you, and avoid re-loading it in savedlock.

Re: sig_atomic_t, there's a decent discussion here. The same considerations that apply to signal handlers would also apply to threads.

With these changes, I'd guess that your code would be thread-safe. I would still recommend using mutexes, though, since you really don't know how long your RB-tree insert will take in the general case (per my previous comments under the question).

1
votes

It may be worth noting that if you're using the Win32 mutexes, that from Vista onwards a thread pool is provided for you. Depending on what you use the RB tree for, you could replace with that.

Also, what you should remember is that atomic operations are not particularly fast. Microsoft said they were a couple hundred cycles, each.

Rather than trying to "protect" the function in this way, it would likely be much more efficient to simply synchronize the threads, either changing to a SIMD/thread pool approach, or to just use a mutex.

But, of course, without seeing your code, I can't really make any more comments. The trouble with multithreading is that you have to see someone's whole model to understand it.