Why doesn't std::atomic initialisation do atomic release so other threads can see the initialised value?

Question

Something very odd turned up during the thread sanitising of proposed boost::concurrent_unordered_map and is recounted at this blog post. In short, bucket_type looks like this:

  struct bucket_type_impl
  {
    spinlock<unsigned char> lock;  // = 2 if you need to reload the bucket list
    atomic<unsigned> count; // count is used items in there
    std::vector<item_type, item_type_allocator> items;
    bucket_type_impl() : count(0), items(0) {  }
    ...

Yet the thread sanitiser claims that there is a race between the construction of a bucket_type and its first use, specifically when the count atomic is loaded from. It turns out that if you initialise a std::atomic<> via its constructor, that initialisation is not atomic and therefore the memory location is not atomically released and therefore not visible to other threads, which is counterintuitive given it's an atomic, and that most atomic operations default to memory_order_seq_cst. You must therefore explicitly do a release store after construction to initialise the atomic with a value visible to other threads.

Is there some extremely pressing reason why std::atomic with a value consuming constructor does not initialise itself with release semantics? If not, I think this is a library defect.

Edit: Jonathan's answer is the better for the history as to why, but ecatmur's answer links to Alastair's defect report on the matter, and how it was closed by simply adding a note to say construction offers no visibility to other threads. I'll therefore award the answer to ecatmur. Thanks to all who replied, I think the way is clear to ask for an extra constructor, it will at least stand out in the documentation that there is something unusual with the value consuming constructor.

Edit 2: I ended up raising this as a defect in the C++ language with the committee, and Hans Boehm who chairs the Concurrency part feels this is not an issue for the following reasons:

No present C++ compiler in 2014 treats consume as different to acquire. As you will never, in real world code, pass an atomic to another thread without going through some release/acquire, the initialisation of the atomic would be made visible to all threads using the atomic. I think this fine until compilers catch up, and before that the Thread Sanitiser will warn on this.
If you're doing mismatched consume-acquire-release like I am (I am using a release-inside-lock/consume-outside-lock atomic to speculatively avoid a release-acquire spinlock where it was unnecessary) then you're a big enough boy to know you must manually store release atomics after construction. That is probably a fair point.

How can you access the atomic variable from one thread when you don't know that it has been constructed yet in another thread? Surely you need some other way to synchronize the threads to ensure it's always constructed first, and that way would involve release semantics. — interjay
@interjay That's a separate matter. You can examine the source code at github.com/ned14/boost.spinlock/blob/master/spinlock.hpp#L438 if you'd like to see how. — Niall Douglas
@interjay: To summarise, yes you release after rehash, but that means not a jot when you consume (not acquire) a speculative read of count before deciding to lock. — Niall Douglas

ecatmur ecatmur · Accepted Answer · 2014-09-01T17:16:20

It's because the converting constructor is constexpr, and constexpr functions can't have side effects such as atomic semantics.

In DR846, Alastair Meredith writes:

I'm not sure if the initialization is implied by use of constexpr keyword (which restricts the form of a constructor) but even if that is the case, I think it is worth spelling out explicitly as the inference would be far too subtle in that case.

The resolution for that defect (by Lawrence Crowl) was to document the constructor with the note:

[Note: Construction is not atomic. —end note]

The note was then expanded to the current wording, giving an example of a possible memory race (via memory_order_relaxed operations communicating the address of the atomic) in DR1478.

The reason that the converting constructor needs to be constexpr is (primarily) to allow static initialization. In DR768 we see:

Further discussion: why is the ctor labeled "constexpr"? Lawrence [Crowl] said this permits the object to be statically initialized, and that's important because otherwise there would be a race condition on initialization.

So: making the constructor constexpr eliminates race conditions on static-lifetime objects, at the cost of a race in dynamic-lifetime objects that only occurs in fairly contrived situations, since for a race to occur the memory location of the dynamic-lifetime atomic object must be communicated to another thread in a way that does not result in the value of the atomic object being also synchronized to that thread.

Why doesn't std::atomic initialisation do atomic release so other threads can see the initialised value?

3 Answers