1
votes

I have two threads, Producer and Consumer. Data exchange is controlled by two pointers inside std::atomics:

std::atomic<TNode*> next = nullptr;
std::atomic<TNode*> waiting = nullptr;

Thread Producer publishs the prepared data and afterwards checks the value of waiting:

TNode* newNext = new TNode( );
// ... fill *newNext ...
next.store( newNext, std::memory_order_release );
TNode* oldWaiting = waiting.load( std::memory_order_seq_cst );
if( oldWaiting == nullptr )
{
  /* wake up Consumer */
}

It is crucical that the load on waiting comes after the store on next, but std::memory_order_seq_cst has much stronger guarantees than I really need since I really only need the order of those two accesses fixed. Is it possible to get the memory order I need without the need for memory_order_seq_cst?

Here is the rest of the picture:

Thread Consumer checks next. If it finds it empty it sets waiting to signal Producer before blocking itself.

TNode* newCurrent = next.load( std::memory_order_consume );
if( newCurrent == nullptr )
{
  waiting.store( current, std::memory_order_relaxed );
  /* wait, blocking, for next != nullptr */
}
current = newCurrent;

The whole thing is a producer-consumer queue that keeps the need for locking low without the need for all that complicated mechanisms. next is actually inside the current node of a singly linked list. Data typically comes in bursts, so that in most cases Consumer finds a whole bunch of nodes ready for consuming; apart from rare cases, both threads and only go through the locking and blocking/wakeup once between bursts.

3

3 Answers

2
votes

You're essentially looking for the mirror order of memory_order_release. That is memory_order_acquire.

This is slightly stronger than you ask for. No memory access can be reordered after the .load. However, CPU's in general do not offer a way to partially order two accesses, and C++ therefore doesn't have this granularity either.

C++ in theory has release/consume ordering as well, but nobody really needs that.

2
votes

Be very careful. You're going to need a release fence and an acquire fence somewhere to make sure the writes you performed during:

TNode* newNext = new TNode( );
// ... fill *newNext ...

Are visible to the consumer.

The nearest you can do is perform a 'relaxed' read of the atomic in the Consumer then perform acquire and start 'consuming' the object. On some (most?) architectures that is likely to have no effect.

Have a read of 'A Walkthrough Using Acquire and Release Fences' here http://preshing.com/20130922/acquire-and-release-fences/.

I couldn't write something closer to a worked example of exactly what you're doing. Producer/Consumer is (face it) the textbook challenge.

Slightly off question. I would use a std::condition_variable. They're made for this.

Slightly further off question I'm not too keen on your locking strategy. It depends on how long the producer/consumer might take but if producer 'bursts' like you say it might be a bad idea to block it white Consumer is working. You've effectively made them take turns. What you can do (with only a modicum of care) is make Producer able to be shoving work on (TNodes) on the back of the queue almost unhindered by Consumer. So if Consumer takes a while Producer could constitute no latency overhead.

That is make a design that doesn't have :

/* wait, blocking, for next != nullptr */

That's holding up

TNode* newNext = new TNode( );
// ... fill *newNext ...

On the next work item. NB: If Consumer logically has to finish before that can happen the whole idea of parallelism for this task is scuppered and you may as well go sequential.

-1
votes

short answer:

The compiler is free to reorder memory access (or even elide them when not volatile) except that:

If you specify a store to be memory_order_release prior to a load specifying memory_order_acquire, then the compiler is required to respect your intent and not reorder the load to 'happen before' the store.

Sequential consistency will achieve this without giving maintainers headaches. It is also optimally efficient on the upcoming arm 8, which will be the first processor to have the instructions load-acquire and store-release correctly implemented.

You can find everything you ever needed to know about c++ atomics in these two talks:

https://channel9.msdn.com/Shows/Going+Deep/Cpp-and-Beyond-2012-Herb-Sutter-atomic-Weapons-1-of-2

https://channel9.msdn.com/Shows/Going+Deep/Cpp-and-Beyond-2012-Herb-Sutter-atomic-Weapons-2-of-2

I would suggest that they are required viewing before attempting anything with atomics.

After watching them, you will probably realise that you didn't know anything about atomics before, even though you thought you did. This was certainly the case for me.