0
votes

I tried a lock free single producer-single consumer implementation link. The implementation uses list as the underlying data structure and rely on the fact that only producer thread modifies the list.The consumer thread moves the head variable if _head is not equal to _tail.

int produced_count_, consumed_count_;
std::list<int> data_queue_;
std::list<int>::iterator head_, tail_;    

void ProducerConumer::produce() {
    static int count = 0;
    data_queue_.push_back(int(count++));
    ++produced_count_;
    tail_ = data_queue_.end();
    data_queue_.erase(data_queue_.begin(), head_);
}

bool ProducerConumer::consume() {
    auto it = head_;
    ++it;
    if(it != tail_) {
        head_ = it;
        ++consumed_count_;
        int t = *it;
        return true;
    } 
    
    return false;
}

At any point head iterator points to a value that has already been read.

As there is no synchronized here, I was under the impression that the implementation would not work as the writes by one thread might not be visible to the other thread. But when I tested my code producer and consumer always produced/consumed the same number of units. Can someone explain how this code can work without explicit synchronization? ( I did not expect changes to tail_ and head_ variable visible to the other thread)

Code to control producer/consumer thread is as follows

consumer_thread_ = std::thread([this]() {
set_cpu_affinity(0);
std::chrono::milliseconds start_time = current_time();
while((current_time() - start_time) < std::chrono::milliseconds(150)) {
        this->consume();
    }
    std::cout << "Data queue size from consumer is " << data_queue_.size() << " time " << current_time().count() << "\n";
});

producer_thread_ = std::thread([this]() {
    set_cpu_affinity(7);
    std::chrono::milliseconds start_time = current_time();
    while((current_time() - start_time) < std::chrono::milliseconds(100)) {
        this->produce();
    }
    std::this_thread::sleep_for(std::chrono::milliseconds(100));
    std::cout << "Data queue size from producer is " << data_queue_.size() << " time " << current_time().count() << "\n";
});

I ensure that the producer thread outlives consumer thread by adding the sleep_for at the end of the producer thread.

BTW, here is a dissection of the implementation where Herb Sutter discussed what's wrong with it link. But he never talked about whether changes to tail_ and head_ are visible to other thread.

2
You're probably less likely to have issues if you use a std::vector with atomic numerical indexes rather than std::list::iterator which has no atomicity guaranteesAlan Birtles
I ran this code many times, with different compiler options and didnt see any race conditions.user14757101
it might happen to work on some or even most standard libraries but there's no guaranteeAlan Birtles

2 Answers

1
votes

Debug builds will often "happen to work" especially on x86 because the constraints that puts on code-gen block compile-time reordering, and x86 hardware blocks most run-time reordering.

If you compile in debug mode, memory accesses will happen in program order, and the compiler won't keep values in registers across statements. (A bit like to volatile, which can be used to roll your own atomics; but don't: When to use volatile with multi threading?). Still, cache is coherent, simply loading and storing is asm is sufficient for global visibility (in some order).

They'll be atomic because they're int sized and aligned, and the compiler does them with a single instruction because it's not a DeathStation 9000. Naturally aligned int loads and stores are atomic in asm on normal machines like x86, not guaranteed in C. (https://lwn.net/Articles/793253/)

If you only test on x86, the hardware memory model gives you program-order plus a store buffer, so you effectively get the same asm as you would with std::atomic memory_order_acquire and release. (Because of debug builds not reordering between statements).

C++ undefined behaviour (including this data-race UB) does not mean "guaranteed to fail or crash" - that's what makes it so nasty and why testing isn't sufficient to find it.

Compile with optimization enabled and you might see big problems, depending on compile-time reordering and hoisting choices. e.g. if the compiler can keep a variable in a register for the duration of a loop, it'll never re-read from cache/memory and never see what the other thread stored. Among other problems. Multithreading program stuck in optimized mode but runs normally in -O0

Code isn't very useful if it only happens to work in "training wheels" mode because you haven't told the compiler how to optimize it safely. (By using std::atomic for example).


I haven't looked in a lot of detail at your code, but I don't think you have any variables that are modified by both threads. In a circular-buffer queue, you often have a ++ increment on a variable that's RMWed by the producer but read-only by the consumer. And vice versa for a read position. Those don't need to be an atomic RMW, only an atomic store so the other thread's atomic load can see a not-torn value. That happens for "naturally" in asm.

Here I think you're just storing a new head, and the other thread is just reading that.

In a linked list, deallocation can be a problem especially with multiple consumers. You can't free or recycle the node until you're sure that no threads have a pointer to it. Garbage collected languages / runtimes can use linked lists for lockless queues much more easily because the GC already has to handle the same checking for objects in general.

So make sure you get this right if rolling your own; it can be tricky. Although as long as you only link a node into the linked list after it's constructed, and there's only one consumer, you never have visibility of half-constructed nodes. And you never have one thread deallocating a node that another thread might wake up and continue reading.

0
votes

The article says:

Another issue is using the standard std::list. While the article mentions that it is the developer responsibility to check that the reading/writing std::list::iterator is atomic, this turns out to be too restrictive. While gcc/MSVC++2003 has 4-byte iterators, the MSVC++2005 has 8-byte iterators in Release Mode and 12-byte iterators in the Debug Mode.

That is your responsibility to ensure that iterators are atomic. That is not the case for std::list. There are no guarantees on the read/write operations from different threads unless you explicitly specify data as atomic. However even if "undefined behavior" means "nasal demons", there is nothing wrong if these demons are observed as a consistent synchronization.