I have been looking at a lock free single producer/single consumer circular buffer on this website when I couldn't figure out why a specific memory barrier was needed. I have carefully read a hundread time the standard rules about memory order but I don't understand what I'm missing.
With this implementation, there is only a unique thread which can call the push()
function and another unique thread which can call the pop()
function.
Here is the Producer
code:
bool push(const Element& item)
{
const auto current_tail = _tail.load(std::memory_order_relaxed); //(1)
const auto next_tail = increment(current_tail);
if(next_tail != _head.load(std::memory_order_acquire)) //(2)
{
_array[current_tail] = item; //(3)
_tail.store(next_tail, std::memory_order_release); //(4)
return true;
}
return false; // full queue
}
Here is the Consumer
code:
bool pop(Element& item)
{
const auto current_head = _head.load(std::memory_order_relaxed); //(1)
if(current_head == _tail.load(std::memory_order_acquire)) //(2)
return false; // empty queue
item = _array[current_head]; //(3)
_head.store(increment(current_head), std::memory_order_release); //(4)
return true;
}
I understand why the Producer (4)
and the Consumer (2)
statements are absolutly needed, this is because we have to make sure that all writes that happened before the (4) released store
by the Producer
will be visible side effects once the consumer
will see the stored value.
I also understand why the Consumer (4)
statement is needed, this is to make sure that the Consumer (3)
load will be performed before the Consumer (4)
store will be performed.
The question
- Why is the
Producer (2)
load needs to be performed with acquire semantic (instead of relaxed)? Is it to preventProducer (3) or (4)
to be reodered before the condition (at compile time or at runtime)?