Does standard C++11 guarantee that `volatile atomic<T>` has both semantics (volatile + atomic)?

Question

As known, std::atomic and volatile are different things.

There are 2 main differences:

Two optimizations can be for std::atomic<int> a;, but can't be for volatile int a;:
- fused operations: a = 1; a = 2; can be replaced by the compiler on a = 2;
- constant propagation: a = 1; local = a; can be replaced by the compiler ona = 1; local = 1;
Reordering of ordinary reads/writes across atomic/volatile operations:
- for volatile int a; any volatile-read/write-operations can't be reordered. But nearby ordinary reads/writes can still be reordered around volatile reads/writes.
- for std::atomic a; reordering of nearby ordinary reads/writes restricted based on the memory barrier used for atomic operation a.load(std::memory_order_...);

I.e. volatile don't introduce a memory fences, but std::atomic can do it.

As is well described in the article:

Herb Sutter, January 08, 2009 - part 1: http://www.drdobbs.com/parallel/volatile-vs-volatile/212701484
Herb Sutter, January 08, 2009 - part 2: http://www.drdobbs.com/parallel/volatile-vs-volatile/212701484?pgno=2

For example, std::atomic should be used for concurrent multi-thread programs (CPU-Core <-> CPU-Core), but volatile should be used for access to Mamory Mapped Regions on devices (CPU-Core <-> Device).

But if required, both have unusual semantics and has any or all of the atomicity and/or ordering guarantees needed for lock-free coding, i.e. if required volatile std::atomic<>, require for several reasons:

ordering: to prevent reordering of ordinary reads/writes, for example, for reads from CPU-RAM, to which the data been written using the Device DMA-controller

For example:

char cpu_ram_data_written_by_device[1024];
device_dma_will_write_here( cpu_ram_data_written_by_device );

// physically mapped to device register
volatile bool *device_ready = get_pointer_device_ready_flag();

//... somewhere much later
while(!device_ready); // spin-lock (here should be memory fence!!!)
for(auto &i : cpu_ram_data_written_by_device) std::cout << i;

spilling: CPU write to CPU-RAM and then Device DMA-controller read from this memory: https://en.wikipedia.org/wiki/Register_allocation#Spilling

example:

char cpu_ram_data_will_read_by_device[1024];
device_dma_will_read_it( cpu_ram_data_written_by_device );

// physically mapped to device register
volatile bool *data_ready = get_pointer_data_ready_flag();

//... somewhere much later
for(auto &i : cpu_ram_data_will_read_by_device) i = 10;
data_ready=true; //spilling cpu_ram_data_will_read_by_device to RAM, should be memory fence

atomic: to guarantee that the volatile operation will be atomic - i.e. It will consist of a single operation instead of multiple - i.e. one 8-byte-operation instead of two 4-byte-operations

For this, Herb Sutter said about volatile atomic<T>, January 08, 2009: http://www.drdobbs.com/parallel/volatile-vs-volatile/212701484?pgno=2

Finally, to express a variable that both has unusual semantics and has any or all of the atomicity and/or ordering guarantees needed for lock-free coding, only the ISO C++0x draft Standard provides a direct way to spell it: volatile atomic.

But do modern standards C++11 (not C++0x draft), C++14, and C++17 guarantee that volatile atomic<T> has both semantics (volatile + atomic)?

Does volatile atomic<T> guarantee the most stringent guarantees from both volatile and atomic?

As in volatile: Avoids fused-operations and constant-propagation as described in the beginning of the question
As in std::atomic: Introduces memory fences to provide ordering, spilling, and being atomic.

And can we do reinterpret_cast from volatile int *ptr; to volatile std::atomic<int>*?

Let me throw in a short comment. volatile atomic<T> over atomic<volatile T> and why would you want to do the reinterpret_cast? It will probably work, but not guaranteed. — DeiDei
You can't have std::atomic<volatile T> because a volatile type is not trivially copyable. — Brian Bi
@Brian Yes you are right. Removed about std::atomic<volatile T>. — Alex
@DeiDei If driver-API returns volatile int *ptr; and I want to use code while(ptr->load(std::memory_order_acquire) == 0); instead of while(*ptr == 0); std::atomic_thread_fence(std::memory_order_acquire); — Alex
" one 8-byte-operation instead of two 4-byte-operations" - atomic doesn't guarantee that. It could very well take a lock and then do two 4-byte writes. ATOMIC_LONG_LOCK_FREE could be 0 to say "never lock-free". — Bo Persson

vll vll · Accepted Answer · 2016-11-30T09:35:48

Yes, it does.

Section 29.6.5, "Requirements for operations on atomic types"

Many operations are volatile-qualified. The “volatile as device register” semantics have not changed in the standard. This qualification means that volatility is preserved when applying these operations to volatile objects.

I checked working drafts 2008 through 2016, and the same text is in all of them. Therefore it should apply C++11, C++14, and C++17.

Does standard C++11 guarantee that `volatile atomic<T>` has both semantics (volatile + atomic)?

2 Answers