6
votes

I have been looking at a lock free single producer/single consumer circular buffer on this website when I couldn't figure out why a specific memory barrier was needed. I have carefully read a hundread time the standard rules about memory order but I don't understand what I'm missing.

With this implementation, there is only a unique thread which can call the push() function and another unique thread which can call the pop() function.

Here is the Producer code:

bool push(const Element& item)
{       
  const auto current_tail = _tail.load(std::memory_order_relaxed);  //(1)
  const auto next_tail = increment(current_tail);

  if(next_tail != _head.load(std::memory_order_acquire))            //(2)               
  {     
    _array[current_tail] = item;                                    //(3)
    _tail.store(next_tail, std::memory_order_release);              //(4)
    return true;
  }
  return false; // full queue
}

Here is the Consumer code:

bool pop(Element& item)
{
  const auto current_head = _head.load(std::memory_order_relaxed);    //(1)
  if(current_head == _tail.load(std::memory_order_acquire))           //(2)
    return false; // empty queue

  item = _array[current_head];                                       //(3)
  _head.store(increment(current_head), std::memory_order_release);   //(4)
  return true;
}

I understand why the Producer (4) and the Consumer (2) statements are absolutly needed, this is because we have to make sure that all writes that happened before the (4) released store by the Producerwill be visible side effects once the consumer will see the stored value.

I also understand why the Consumer (4) statement is needed, this is to make sure that the Consumer (3) load will be performed before the Consumer (4) store will be performed.

The question

  • Why is the Producer (2) load needs to be performed with acquire semantic (instead of relaxed)? Is it to prevent Producer (3) or (4) to be reodered before the condition (at compile time or at runtime)?
2
One question per question please.πάντα ῥεῖ
@πάνταῥεῖ done ;)Qzaac

2 Answers

4
votes

we need prove that

_array[current_tail] = item; // push(3)

excecuted after conforming (current_head == current_tail)

item = _array[current_head]; // pop(3)

is completed. we can overwrite cell, only after data from it already copied to item

the

_head.load(std::memory_order_acquire) // push(2)

synchronized with

_head.store(increment(current_head), std::memory_order_release);   //pop(4)

via Release-Acquire ordering:

all memory writes ( pop(3) ) that happened-before the atomic store-release ( pop(4) ) on _head become visible side-effects once the atomic load-acquire ( push(2) ) is completed on _head.

so Producer code after push(2) is completed, guaranteed to see result of pop(3). this mean that data from _array[current_head] is copied to item and result of this operation is visible for Producer code after push(2), so _array[current_head] already free.

from another side from memory_order_acquire load description - no reads or writes ( push(3) ) in the current thread can be reordered before this load. so push(3) will be execute already after push(2) load complete, but at this point pop(3) already completed

item = _array[current_head];                                        //pop(3)
_head.store(increment(current_head), std::memory_order_release);    //pop(4)
-----
    _head.load(std::memory_order_acquire);                          //push(2)
    _array[current_tail] = item;                                    //push(3)         
3
votes

The memory barriers prevent the CPU from reordering accesses to the Element object, which are not using interlocks, across access to the queue structure (here implemented using indexes, but pointers are equally viable).

Using your numbering, it is essential that (3) is performed between (2) and (4), and a memory barrier provides that.

The exact case you ask about of (2)-vs-(3) in the Producer prevents speculatively overwriting valid data when the queue is full (the proposed site overlaps with valid data). Without the barrier, even though when the condition failed the original data would be restored from the perspective of the Producer thread, intermediate values might be briefly visible to the Consumer.