How can I reproduce the race conditions in this python code reliably?

Question

Context

I recently posted a timer class for review on Code Review. I'd had a gut feeling there were concurrency bugs as I'd once seen 1 unit test fail, but was unable to reproduce the failure. Hence my post to code review.

I got some great feedback highlighting various race conditions in the code. (I thought) I understood the problem and the solution, but before making any fixes, I wanted to expose the bugs with a unit test. When I tried, I realised it was difficult. Various stack exchange answers suggested I'd have to control the execution of threads to expose the bug(s) and any contrived timing would not necessarily be portable to a different machine. This seemed like a lot of accidental complexity beyond the problem I was trying to solve.

Instead I tried using the best static analysis (SA) tool for python, PyLint, to see if it'd pick out any of the bugs, but it couldn't. Why could a human find the bugs through code review (essentially SA), but a SA tool could not?

Afraid of trying to get Valgrind working with python (which sounded like yak-shaving), I decided to have a bash at fixing the bugs without reproducing them first. Now I'm in a pickle.

Here's the code now.

from threading import Timer, Lock
from time import time

class NotRunningError(Exception): pass
class AlreadyRunningError(Exception): pass


class KitchenTimer(object):
    '''
    Loosely models a clockwork kitchen timer with the following differences:
        You can start the timer with arbitrary duration (e.g. 1.2 seconds).
        The timer calls back a given function when time's up.
        Querying the time remaining has 0.1 second accuracy.
    '''

    PRECISION_NUM_DECIMAL_PLACES = 1
    RUNNING = "RUNNING"
    STOPPED = "STOPPED"
    TIMEUP  = "TIMEUP"

    def __init__(self):
        self._stateLock = Lock()
        with self._stateLock:
            self._state = self.STOPPED
            self._timeRemaining = 0

    def start(self, duration=1, whenTimeup=None):
        '''
        Starts the timer to count down from the given duration and call whenTimeup when time's up.
        '''
        with self._stateLock:
            if self.isRunning():
                raise AlreadyRunningError
            else:
                self._state = self.RUNNING
                self.duration = duration
                self._userWhenTimeup = whenTimeup
                self._startTime = time()
                self._timer = Timer(duration, self._whenTimeup)
                self._timer.start()

    def stop(self):
        '''
        Stops the timer, preventing whenTimeup callback.
        '''
        with self._stateLock:
            if self.isRunning():
                self._timer.cancel()
                self._state = self.STOPPED
                self._timeRemaining = self.duration - self._elapsedTime()
            else:
                raise NotRunningError()

    def isRunning(self):
        return self._state == self.RUNNING

    def isStopped(self):
        return self._state == self.STOPPED

    def isTimeup(self):
        return self._state == self.TIMEUP

    @property
    def timeRemaining(self):
        if self.isRunning():
            self._timeRemaining = self.duration - self._elapsedTime()
        return round(self._timeRemaining, self.PRECISION_NUM_DECIMAL_PLACES)

    def _whenTimeup(self):
        with self._stateLock:
            self._state = self.TIMEUP
            self._timeRemaining = 0
            if callable(self._userWhenTimeup):
                self._userWhenTimeup()

    def _elapsedTime(self):
        return time() - self._startTime

Question

In the context of this code example, how can I expose the race conditions, fix them, and prove they're fixed?

Extra points

extra points for a testing framework suitable for other implementations and problems rather than specifically to this code.

Takeaway

My takeaway is that the technical solution to reproduce the identified race conditions is to control the synchronism of two threads to ensure they execute in the order that will expose a bug. The important point here is that they are already identified race conditions. The best way I've found to identify race conditions is to put your code up for code review and encourage more expert people analyse it.

PyLint knows nothing about threads - that's why it didn't help. In general, you're addressing very hard problems here. Follow the references here and you'll discover they're not of much help :-( — Tim Peters
Very hard in general, doesn't mean impossible though, right? I'm looking for an answer specific to this example. So far the only way I managed to detect race conditions is through code review. But will it detect them reliably? And is there a faster way to find out if I've fixed (or introduced) a concurrency bug? — doughgle
The better way is to code defensively, in this case to synchronize the timeRemaining method as well, as @perreal suggests. As far as tests go, brute force is a quite solid option. — flup
Generally it's best to think it through and make sure that there are no race conditions since those are really hard to detect (may happen only one in a million times you run the program). — mb21
Maybe before_after can help: oreills.co.uk/2015/03/01/testing-race-conditions-in-python.html — Jérôme

Tim Pierce Tim Pierce · Accepted Answer · 2013-11-27T06:07:03

Traditionally, forcing race conditions in multithreaded code is done with semaphores, so you can force a thread to wait until another thread has achieved some edge condition before continuing.

For example, your object has some code to check that start is not called if the object is already running. You could force this condition to make sure it behaves as expected by doing something like this:

starting a KitchenTimer
having the timer block on a semaphore while in the running state
starting the same timer in another thread
catching AlreadyRunningError

To do some of this you may need to extend the KitchenTimer class. Formal unit tests will often use mock objects which are defined to block at critical times. Mock objects are a bigger topic than I can address here, but googling "python mock object" will turn up a lot of documentation and many implementations to choose from.

Here's a way that you could force your code to throw AlreadyRunningError:

import threading

class TestKitchenTimer(KitchenTimer):

    _runningLock = threading.Condition()

    def start(self, duration=1, whenTimeUp=None):
        KitchenTimer.start(self, duration, whenTimeUp)
        with self._runningLock:
            print "waiting on _runningLock"
            self._runningLock.wait()

    def resume(self):
        with self._runningLock:
            self._runningLock.notify()

timer = TestKitchenTimer()

# Start the timer in a subthread. This thread will block as soon as
# it is started.
thread_1 = threading.Thread(target = timer.start, args = (10, None))
thread_1.start()

# Attempt to start the timer in a second thread, causing it to throw
# an AlreadyRunningError.
try:
    thread_2 = threading.Thread(target = timer.start, args = (10, None))
    thread_2.start()
except AlreadyRunningError:
    print "AlreadyRunningError"
    timer.resume()
    timer.stop()

Reading through the code, identify some of the boundary conditions you want to test, then think about where you would need to pause the timer to force that condition to arise, and add Conditions, Semaphores, Events, etc. to make it happen. e.g. what happens if, just as the timer runs the whenTimeUp callback, another thread tries to stop it? You can force that condition by making the timer wait as soon as it's entered _whenTimeUp:

import threading

class TestKitchenTimer(KitchenTimer):

    _runningLock = threading.Condition()

    def _whenTimeup(self):
        with self._runningLock:
            self._runningLock.wait()
        KitchenTimer._whenTimeup(self)

    def resume(self):
        with self._runningLock:
            self._runningLock.notify()

def TimeupCallback():
    print "TimeupCallback was called"

timer = TestKitchenTimer()

# The timer thread will block when the timer expires, but before the callback
# is invoked.
thread_1 = threading.Thread(target = timer.start, args = (1, TimeupCallback))
thread_1.start()
sleep(2)

# The timer is now blocked. In the parent thread, we stop it.
timer.stop()
print "timer is stopped: %r" % timer.isStopped()

# Now allow the countdown thread to resume.
timer.resume()

Subclassing the class you want to test isn't an awesome way to instrument it for testing: you'll have to override basically all of the methods in order to test race conditions in each one, and at that point there's a good argument to be made that you're not really testing the original code. Instead, you may find it cleaner to put the semaphores right in the KitchenTimer object but initialized to None by default, and have your methods check if testRunningLock is not None: before acquiring or waiting on the lock. Then you can force races on the actual code that you're submitting.

Some reading on Python mock frameworks that may be helpful. In fact, I'm not sure that mocks would be helpful in testing this code: it's almost entirely self-contained and doesn't rely on many external objects. But mock tutorials sometimes touch on issues like these. I haven't used any of these, but the documentation on these like a good place to get started:

How can I reproduce the race conditions in this python code reliably?

4 Answers

Sample racey code

How to cause race-conditions

Solving race-conditions

Shared variables

Concurrent reads/writes

Notes