Is this a clang optimizer bug or an undefined behavior in C?

Question

This code gives different results for -O1 and -O2:

/*
    Example of a clang optimization bug.
    Mark Adler, August 8, 2015.

    Using -O0 or -O1 takes a little while and gives the correct result:

        47 bits set (4294967296 loops)

    Using -O2 or -O3 optimizes out the loop, returning immediately with:

        0 bits set (4294967296 loops)

    Of course, there weren't really that many loops.  The number of loops was
    calculated, correctly, by the compiler when optimizing.  But it got the
    number of bits set wrong.

    This is with:

        Apple LLVM version 6.1.0 (clang-602.0.53) (based on LLVM 3.6.0svn)
        Target: x86_64-apple-darwin14.4.0

 */

#include <stdio.h>
#include <inttypes.h>

/* bit vector of 1<<32 bits, initialized to all zeros */
static uint64_t vec[1 << 26] = {0};

int main(void)
{
    /* set 47 of the bits. */
    vec[31415927] = UINT64_C(0xb9fe2f2fedf7ebbd);

    /* count the set bits */
    uint64_t count = 0;
    uint64_t loops = 0;
    uint32_t x = 0;
    do {
        if (vec[x >> 6] & ((uint64_t)1 << (x & 0x3f)))
            count++;
        x++;
        loops++;
    } while (x);
    printf("%" PRIu64 " bits set (%" PRIu64 " loops)\n", count, loops);
    return 0;
}

So is this a bug? Or is there somehow an undefined behavior in there, that the compiler is within its rights to give different results for?

As far as I can tell from the C99 standard, the do loop to go through all uint32_t values is valid since the increment of the largest unsigned integer value is well defined to result in zero.

A computation involving unsigned operands can never overflow, because a result that cannot be represented by the resulting unsigned integer type is reduced modulo the number that is one greater than the largest value that can be represented by the resulting type.

0xb9fe2f2fedf7ebbd is too large to be an int, you should add ULL. — mch
Can't reproduce either with gcc-5.1.1 or clang3.6.0 on my 32 bit system. — P.P
@mch: The ULL suffix is not necessary. The type of a hexadecimal integer constant is of the first of int, unsigned int, long int, unsigned long int, long long int, unsigned long long int in which its value can be represented. 0xb9fe2f2fedf7ebbd is of type unsigned long int on a system with 64-bit long, or unsigned long long int on a system with 32-bit long. — Keith Thompson
What difference you see between the generated code? The only issue I see is the printf formats for uint64_t. Use: printf("%" PRIu64 "%" PRIu64 "\n", count, loops); (might not make any difference if if uint_64t is same ull). — P.P

Keith Thompson Keith Thompson · Accepted Answer · 2015-08-08T20:25:11

I'm reasonably sure this is a bug in clang. I see no undefined behavior in the program (assuming that it doesn't exceed the implementation's capacity limits) -- other than a small problem in the printf call that I'll address below (and that has now been addressed in an edit to the question). It's possible that I've missed something, but I don't think so.

If I have missed something, I expect it to be pointed out shortly. If this answer remains uncontradicted after a few days, I'll take it as a strong indication that it is indeed a bug in clang.

UPDATE : Mark Adler, the original poster, has reported this and confirmed that it's a bug in pre-3.6.0 clang, corrected in later versions. I will shamelessly steal this link to the bug report from his answer.

The correct output is:

47 bits set (4294967296 loops)

To address some of the things that have been pointed out (or that I've noticed myself):

static uint64_t vec[1 << 26] = {0};

This is a large object (2²⁹ bytes, or half a gigabyte, assuming CHAR_BIT==8), but it apparently does not exceed the implementation's capacity. If it did, it would be rejected. I'm not 100% sure the standard requires this, but since the program does work correctly at lower optimization levels, we can assume that the object is not too large.

vec[31415927] = 0xb9fe2f2fedf7ebbd

The constant 0xb9fe2f2fedf7ebbd is not a problem. Its value is between 2⁶³ and 2⁶⁴, so it's within the range of uint64_t. The type of a hexadecimal integer constant is wide enough to hold its value (unless it exceeds ULLONG_MAX, but that's not the case here).

if (vec[x >> 6] & ((uint64_t)1 << (x & 0x3f)))

I briefly thought that the left shift might be a problem, but it isn't. The left operand is of type uint64_t, and the right operand is in the range 0 .. 63. A left shift of 64 bits would have undefined behavior, but that's not the case here.

printf("%llu bits set (%llu loops)\n", count, loops);

The following has been addressed by an update to the question. I've tried the updated version of the code and I got the same results.

%llu requires an argument of type unsigned long long; count and loops are of type uint64_t. Here, depending on the implementation, we might have undefined behavior (on my system uint64_t is a typedef for unsigned long, and I get a warning). But it's not likely to cause any actual problems (unsigned long long and uint64_t typically have the same representation even if they're not the same type), and when I add casts to avoid any UB:

printf("%llu bits set (%llu loops)\n",
       (unsigned long long)count,
       (unsigned long long)loops);

I get the same behavior. The following results are for a program with casts added to the printf call.

Using gcc 5.2.0 on my 64-bit system, I get the correct output with -O0, -O1, -O2, and -O3, with or without -m32. The timing indicates that gcc does not eliminate the loop at any optimization level.

Using clang 3.4 on the same system, I get the correct output with -O0 or -O1, but incorrect output (0 bits set) at -O2 or -O3. Timing indicates that the loop is eliminated at -O2 and -O3. When I compile with clang -m32, the output is correct (and the loop is not eliminated) at all optimization levels.

When I change the declaration of loops to

volatile uint64_t loops = 0;

I get correct output at all optimization levels (and the loop is not eliminated).

A further tweak to the program (not shown here) shows that vec[31415927] really is set to 0xb9fe2f2fedf7ebbd, even when optimization produces the wrong bit count.

Is this a clang optimizer bug or an undefined behavior in C?

3 Answers