131
votes

I believe I found a bug in GCC while implementing O'Neill's PCG PRNG. (Initial code on Godbolt's Compiler Explorer)

After multiplying oldstate by MULTIPLIER, (result stored in rdi), GCC doesn't add that result to INCREMENT, movabs'ing INCREMENT to rdx instead, which then gets used as the return value of rand32_ret.state

A minimum reproducible example (Compiler Explorer):

#include <stdint.h>

struct retstruct {
    uint32_t a;
    uint64_t b;
};

struct retstruct fn(uint64_t input)
{
    struct retstruct ret;

    ret.a = 0;
    ret.b = input * 11111111111 + 111111111111;

    return ret;
}

Generated assembly (GCC 9.2, x86_64, -O3):

fn:
  movabs rdx, 11111111111     # multiplier constant (doesn't fit in imm32)
  xor eax, eax                # ret.a = 0
  imul rdi, rdx
  movabs rdx, 111111111111    # add constant; one more 1 than multiplier
     # missing   add rdx, rdi   # ret.b=... that we get with clang or older gcc
  ret
# returns RDX:RAX = constant 111111111111 : 0
# independent of input RDI, and not using the imul result it just computed

Interestingly, modifying the struct to have the uint64_t as the first member produces correct code, as does changing both members to be uint64_t

x86-64 System V does return structs smaller than 16 bytes in RDX:RAX, when they're trivially copyable. In this case the 2nd member is in RDX because the high half of RAX is the padding for alignment or .b when .a is a narrower type. (sizeof(retstruct) is 16 either way; we're not using __attribute__((packed)) so it respects alignof(uint64_t) = 8.)

Does this code contain any undefined behaviour that would allow GCC to emit the "incorrect" assembly?

If not, this should get reported on https://gcc.gnu.org/bugzilla/

3
Comments are not for extended discussion; this conversation has been moved to chat.Samuel Liew♦

3 Answers

103
votes

I don't see any UB here; your types are unsigned so signed-overflow UB is impossible, and there's nothing weird. (And even if signed, it would have to produce correct outputs for inputs that don't cause overflow UB, like rdi=1). It's broken with GCC's C++ front-end as well.

Also, GCC8.2 compiles it correctly for AArch64 and RISC-V (to an madd instruction after using movk to construct constants, or RISC-V mul and add after loading the constants). If it was UB that GCC was finding, we'd generally expect it to find it and break your code for other ISAs as well, at least ones that have similar type widths and register widths.

Clang also compiles it correctly.

This appears to be a regression from GCC 5 to 6; GCC5.4 compiles is correctly, 6.1 and later don't. (Godbolt).

You can report this on GCC's bugzilla using the MCVE from your question.

It really looks like it's a bug in x86-64 System V struct-return handling, perhaps of structs containing padding. That would explain why it works when inlining, and when widening a to uint64_t (avoiding padding).

21
votes

This has been fixed on trunk/master.

Here is the relevant commit.

And this is a patch to fix the issue.

Based on a comment in the patch, the reload_combine_recognize_pattern function was trying to adjust USE insns.

14
votes

Does this code contain any undefined behaviour that would allow GCC to emit the "incorrect" assembly?

The behavior of the code presented in the question is well defined with respect to the C99 and later C language standards. In particular, C permits functions to return structure values without restriction.