20
votes

I'm seeing behaviour I don't expect when compiling this code with different optimization levels in gcc.

The function test should fill a 64 bit unsigned integer with ones, shift them shift_size bits to the left, and return the 32 low bits as a 32 bit unsigned integer.

When I compile with -O0 I get the results I expect.

When I compile with -O2 I do not, if I try to shift by 32 bits or more.

In fact, I get exactly the results I'd expect if I were shifting a 32 bit integer by shifts greater than or equal to bit width on x86, which is a shift using only the 5 low bits of the shift size.

But I'm shifting a 64 bit number, so shifts < 64 should be legal right?

I assume it's a bug in my understanding and not in the compiler, but I haven't been able to figure it out.

My machine: gcc (Ubuntu/Linaro 4.4.4-14ubuntu5) 4.4.5 i686-linux-gnu

#include <stdint.h>
#include <stdio.h>
#include <inttypes.h>

uint32_t test(unsigned int shift_size) {
    uint64_t res = 0;
    res = ~res;
    res = res << shift_size; //Shift size < uint64_t width so this should work
    return res; //Implicit cast to uint32_t
}

int main(int argc, char *argv[])
{
    int dst;
    sscanf(argv[1], "%d", &dst); //Get arg from outside so optimizer doesn't eat everything
    printf("%" PRIu32 "l\n", test(dst));
    return 0;
}

Usage:

$ gcc -Wall -O0 test.c 
$ ./a.out 32
0l
$ gcc -Wall -O2 test.c 
$ ./a.out 32
4294967295l

gcc -S -Wall -O0 test.c

gcc -S -Wall -O2 test.c

5
Could you also post the relevant parts of the assembly code generated by your compiler? (-S for gcc)Greg Hewgill
FWIW it gives the correct result (0l) for me on all optimisation levels from -O0 to -O3 using gcc 4.2.1, so I suspect you might have found a gcc bug.Paul R
hm, code works fine in both optimization levels for me... (gcc version 4.5.0 20100604, openSUSE 11.3 (x86_64))Bort
@Alex: if you try the non-LLVM gcc 4.2.1 with -O2 -m32 then you should see it (that's what I'm using)Paul R

5 Answers

5
votes

"%u" (or "%lu") and uint32_t are not necessarily compatible. Try

    #include <inttypes.h>

    //printf("%ul\n", test(dst));
    printf("%" PRIu32 "l\n", test(dst));

Printing a uint32_t value with a "%u" (or "%lu") specifier can invoke Undefined Behaviour.

4
votes

I was able to repro this. Here's the relevant bit of generated code with -O2:

    movl    $-1, %eax
    movl    $-1, %edx
    sall    %cl, %eax
    xorl    %edx, %edx
    testb   $32, %cl
    cmovne  %eax, %edx
    cmovne  %edx, %eax    ; This appears to be the instruction in error.
                          ; It looks as though gcc thought that %edx might still
                          ; be zero at this point, because if the shift count is
                          ; >= 32 then %eax should be zero after this.

and here is the equivalent bit with -O0:

    movl    -16(%ebp), %eax
    movl    -12(%ebp), %edx
    shldl   %cl,%eax, %edx
    sall    %cl, %eax
    testb   $32, %cl
    je      L3
    movl    %eax, %edx
    xorl    %eax, %eax         ; correctly zeros %eax if shift count >= 32
L3:
    movl    %eax, -16(%ebp)
    movl    %edx, -12(%ebp)

Compiler is:

i686-apple-darwin11-gcc-4.2.1 (GCC) 4.2.1 (Apple Inc. build 5666) (dot 3)

Thanks for posting your gcc -S output. I had a look and while it's slightly different, the critical part has the same error as what I saw on my machine.

3
votes

It looks like it might be a 32-bit-specific compiler bug to me. With the code from the question and gcc 4.2.1 I can reproduce the bug so long as I compile with gcc -m32 -O2 ....

However, if I add a debug printf:

uint32_t test(unsigned int shift_size) {
    uint64_t res = 0;
    res = ~res;
    res = res << shift_size; //Shift size < uint64_t width so this should work
    printf("res = %llx\n", res);
    return res; //Implicit cast to uint32_t
}

then the problem goes away.

The next step would be to look at the generated code to try and identify/confirm the bug.

1
votes

It looks like a bug. My guess is that the compiler has folded the last two lines from:

res = res << shift_size
return (uint32_t)res;

into:

return ((uint32_t)res) << shift_size;

The latter is now well-defined for 32 or larger.

-2
votes

I don't remember what C99 say, but it seems that in gcc, uint32_t contain at least 32 bits, and may contain more, so when you optimize it, it use 64bit var, which is faster in 64-bit machine.