I have a C++ multi-thread applications running on Intel Xeon 32 cores, compiled with GCC 4.8.2 with optimizations enabled.
I have multiple threads (say A,B,C) that update some POD types, and another thread D that every K seconds reads those variables and send it to a GUI. The threads are spawn across multiple cores and sockets. The writes are protected by a spin-lock. Thread A,B,C are latency sensitive where high performance is a critical aspect. Thread D is not latency sensitive.
Something like:
Thread A,B,C
...
// a,b,c are up to 64 bits (let's say double)
spin-lock
a = computeValue();
b = computeValue();
c = computeValue();
spin-unlock
....
Thread D
...
// a,b,c are up to 64 bits (let's say double)
currValueA = a;
currValueB = b;
currValueC = c;
sendToGui(currValueA ,currValueB ,currValueC );
....
I want to take advantage of Paragraph 8.1.1 https://www.intel.com/content/www/us/en/architecture-and-technology/64-ia-32-architectures-software-developer-vol-3a-part-1-manual.html, about guaranteed atomic operations, and avoid to put a lock protecting the reads made by thread D.
My understanding is that if a,b,c are naturally aligned (with a size no bigger that 64 bits) there is no risk that Thread D could read a value for a,b,c that is taken halfway during the write. In other words the writes and reads will be carried out atomically. The thread D will read either the old value or the new.
Is my understanding correct?
I left to the compiler GCC 4.8.2 to take care of the alignment, i.e. I don't use any gcc built-in directives or functions like std::alignas, sts::alignof, etc.
I am aware that the code is not portable. I would prefer not to use std::atomic to avoid any unnecessary overhead.
std::atomic
and specify target architecture, compiler might do the optimization by itself. So your code would be portable. let compiler job to compiler. – Jarod42std::atomic
? – Pete Beckerstd::atomic
withmemory_order_relaxed
won't have any overhead at all for pure loads and pure stores (and will ensure correct alignment for 64-bit values even in 32-bit code, wherealignof(int64_t)=4
on Linux), but it will interfere with auto-vectorization. (Avoid using any atomic RMW operations, of course). If you care about performance, consider using a newer compiler, like gcc7 or gcc8. There have been various improvements since 4.8. – Peter Cordes