Background
This was inspired by this question/answer and ensuing discussion in the comments: Is the definition of “volatile” this volatile, or is GCC having some standard compliancy problems?. Based on others' and my interpretation of what should happening, as discussed in comments, I've submitted it to GCC Bugzilla: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71793 Other relevant responses are still welcome.
Also, that thread has since given rise to this question: Does accessing a declared non-volatile object through a volatile reference/pointer confer volatile rules upon said accesses?
Intro
I know volatile
isn't what most people think it is and is an implementation-defined nest of vipers. And I certainly don't want to use the below constructs in any real code. That said, I'm totally baffled by what's going on in these examples, so I'd really appreciate any elucidation.
My guess is this is due to either highly nuanced interpretation of the Standard or (more likely?) just corner-cases for the optimiser used. Either way, while more academic than practical, I hope this is deemed valuable to analyse, especially given how typically misunderstood volatile
is. Some more data points - or perhaps more likely, points against it - must be good.
Input
Given this code:
#include <cstddef>
void f(void *const p, std::size_t n)
{
unsigned char *y = static_cast<unsigned char *>(p);
volatile unsigned char const x = 42;
// N.B. Yeah, const is weird, but it doesn't change anything
while (n--) {
*y++ = x;
}
}
void g(void *const p, std::size_t n, volatile unsigned char const x)
{
unsigned char *y = static_cast<unsigned char *>(p);
while (n--) {
*y++ = x;
}
}
void h(void *const p, std::size_t n, volatile unsigned char const &x)
{
unsigned char *y = static_cast<unsigned char *>(p);
while (n--) {
*y++ = x;
}
}
int main(int, char **)
{
int y[1000];
f(&y, sizeof y);
volatile unsigned char const x{99};
g(&y, sizeof y, x);
h(&y, sizeof y, x);
}
Output
g++
from gcc (Debian 4.9.2-10) 4.9.2
(Debian stable
a.k.a. Jessie) with the command line g++ -std=c++14 -O3 -S test.cpp
produces the below ASM for main()
. Version Debian 5.4.0-6
(current unstable
) produces equivalent code, but I just happened to run the older one first, so here it is:
main:
.LFB3:
.cfi_startproc
# f()
movb $42, -1(%rsp)
movl $4000, %eax
.p2align 4,,10
.p2align 3
.L21:
subq $1, %rax
movzbl -1(%rsp), %edx
jne .L21
# x = 99
movb $99, -2(%rsp)
movzbl -2(%rsp), %eax
# g()
movl $4000, %eax
.p2align 4,,10
.p2align 3
.L22:
subq $1, %rax
jne .L22
# h()
movl $4000, %eax
.p2align 4,,10
.p2align 3
.L23:
subq $1, %rax
movzbl -2(%rsp), %edx
jne .L23
# return 0;
xorl %eax, %eax
ret
.cfi_endproc
Analysis
All 3 functions are inlined, and both that allocate volatile
local variables do so on the stack for fairly obvious reasons. But that's about the only thing they share...
f()
ensures to read fromx
on each iteration, presumably due to itsvolatile
- but just dumps the result toedx
, presumably because the destinationy
isn't declaredvolatile
and is never read, meaning changes to it can be nixed under the as-if rule. OK, makes sense.- Well, I mean... kinda. Like, not really, because
volatile
is really for hardware registers, and clearly a local value can't be one of those - and can't otherwise be modified in avolatile
way unless its address is passed out... which it's not. Look, there's just not a lot of sense to be had out ofvolatile
local values. But C++ lets us declare them and tries to do something with them. And so, confused as always, we stumble onwards.
- Well, I mean... kinda. Like, not really, because
g()
: What. By moving thevolatile
source into a pass-by-value parameter, which is still just another local variable, GCC somehow decides it's not or lessvolatile
, and so it doesn't need to read it every iteration... but it still carries out the loop, despite its body now doing nothing.h()
: By taking the passedvolatile
as pass-by-reference, the same effective behaviour asf()
is restored, so the loop doesvolatile
reads.- This case alone actually makes practical sense to me, for reasons outlined above against
f()
. To elaborate: Imaginex
refers to a hardware register, of which every read has side-effects. You wouldn't want to skip any of those.
- This case alone actually makes practical sense to me, for reasons outlined above against
Adding #define volatile /**/
leads to main()
being a no-op, as you'd expect. So, when present, even on a local variable volatile
does do something... I just have no idea what in the case of g()
. What on Earth is going on there?
Questions
- Why does a local value declared in-body produce different results from a by-value parameter, with the former letting reads be optimised away? Both are declared
volatile
. Neither have an address passed out - and don't have astatic
address, ruling out any inline-ASMPOKE
ry - so they can never be modified outwith the function. The compiler can see that each is constant, need never be re-read, andvolatile
just ain't true -- so (A) is either allowed to be elided under such constraints? (acting as-if they weren't declared
volatile
) - - and (B) why does only one get elided? Are some
volatile
local variables morevolatile
than others?
- so (A) is either allowed to be elided under such constraints? (acting as-if they weren't declared
- Setting aside that inconsistency for just a moment: After the read was optimised away, why does the compiler still generate the loop? It does nothing! Why doesn't the optimiser elide it as-if no loop was coded?
Is this a weird corner case due to order of optimising analyses or such? As the code is a daft thought-experiment, I wouldn't chastise GCC for this, but it'd be good to know for sure. (Or is g()
the manual timing loop people have dreamt of all these years?) If we conclude there's no Standard bearing on any of this, I'll move it to their Bugzilla just for their information.
And of course, the more important question from a practical perspective, though I don't want that to overshadow the potential for compiler geekery... Which, if any of these, are well-defined/correct according to the Standard?
g
is a compiler bug according to the Standard – M.Mx
, once for each loop iteration. It would be non-conforming if the system (be it the compiler, or the CPU or whatever) combined all of those to a single read. – M.M