25
votes

This code gives different results for -O1 and -O2:

/*
    Example of a clang optimization bug.
    Mark Adler, August 8, 2015.

    Using -O0 or -O1 takes a little while and gives the correct result:

        47 bits set (4294967296 loops)

    Using -O2 or -O3 optimizes out the loop, returning immediately with:

        0 bits set (4294967296 loops)

    Of course, there weren't really that many loops.  The number of loops was
    calculated, correctly, by the compiler when optimizing.  But it got the
    number of bits set wrong.

    This is with:

        Apple LLVM version 6.1.0 (clang-602.0.53) (based on LLVM 3.6.0svn)
        Target: x86_64-apple-darwin14.4.0

 */

#include <stdio.h>
#include <inttypes.h>

/* bit vector of 1<<32 bits, initialized to all zeros */
static uint64_t vec[1 << 26] = {0};

int main(void)
{
    /* set 47 of the bits. */
    vec[31415927] = UINT64_C(0xb9fe2f2fedf7ebbd);

    /* count the set bits */
    uint64_t count = 0;
    uint64_t loops = 0;
    uint32_t x = 0;
    do {
        if (vec[x >> 6] & ((uint64_t)1 << (x & 0x3f)))
            count++;
        x++;
        loops++;
    } while (x);
    printf("%" PRIu64 " bits set (%" PRIu64 " loops)\n", count, loops);
    return 0;
}

So is this a bug? Or is there somehow an undefined behavior in there, that the compiler is within its rights to give different results for?

As far as I can tell from the C99 standard, the do loop to go through all uint32_t values is valid since the increment of the largest unsigned integer value is well defined to result in zero.

A computation involving unsigned operands can never overflow, because a result that cannot be represented by the resulting unsigned integer type is reduced modulo the number that is one greater than the largest value that can be represented by the resulting type.

3
@black vec is not on the stack - Joe
0xb9fe2f2fedf7ebbd is too large to be an int, you should add ULL. - mch
Can't reproduce either with gcc-5.1.1 or clang3.6.0 on my 32 bit system. - P.P
@mch: The ULL suffix is not necessary. The type of a hexadecimal integer constant is of the first of int, unsigned int, long int, unsigned long int, long long int, unsigned long long int in which its value can be represented. 0xb9fe2f2fedf7ebbd is of type unsigned long int on a system with 64-bit long, or unsigned long long int on a system with 32-bit long. - Keith Thompson
What difference you see between the generated code? The only issue I see is the printf formats for uint64_t. Use: printf("%" PRIu64 "%" PRIu64 "\n", count, loops); (might not make any difference if if uint_64t is same ull). - P.P

3 Answers

28
votes

I'm reasonably sure this is a bug in clang. I see no undefined behavior in the program (assuming that it doesn't exceed the implementation's capacity limits) -- other than a small problem in the printf call that I'll address below (and that has now been addressed in an edit to the question). It's possible that I've missed something, but I don't think so.

If I have missed something, I expect it to be pointed out shortly. If this answer remains uncontradicted after a few days, I'll take it as a strong indication that it is indeed a bug in clang.

UPDATE : Mark Adler, the original poster, has reported this and confirmed that it's a bug in pre-3.6.0 clang, corrected in later versions. I will shamelessly steal this link to the bug report from his answer.

The correct output is:

47 bits set (4294967296 loops)

To address some of the things that have been pointed out (or that I've noticed myself):

static uint64_t vec[1 << 26] = {0};

This is a large object (229 bytes, or half a gigabyte, assuming CHAR_BIT==8), but it apparently does not exceed the implementation's capacity. If it did, it would be rejected. I'm not 100% sure the standard requires this, but since the program does work correctly at lower optimization levels, we can assume that the object is not too large.

vec[31415927] = 0xb9fe2f2fedf7ebbd

The constant 0xb9fe2f2fedf7ebbd is not a problem. Its value is between 263 and 264, so it's within the range of uint64_t. The type of a hexadecimal integer constant is wide enough to hold its value (unless it exceeds ULLONG_MAX, but that's not the case here).

if (vec[x >> 6] & ((uint64_t)1 << (x & 0x3f)))

I briefly thought that the left shift might be a problem, but it isn't. The left operand is of type uint64_t, and the right operand is in the range 0 .. 63. A left shift of 64 bits would have undefined behavior, but that's not the case here.

printf("%llu bits set (%llu loops)\n", count, loops);

The following has been addressed by an update to the question. I've tried the updated version of the code and I got the same results.

%llu requires an argument of type unsigned long long; count and loops are of type uint64_t. Here, depending on the implementation, we might have undefined behavior (on my system uint64_t is a typedef for unsigned long, and I get a warning). But it's not likely to cause any actual problems (unsigned long long and uint64_t typically have the same representation even if they're not the same type), and when I add casts to avoid any UB:

printf("%llu bits set (%llu loops)\n",
       (unsigned long long)count,
       (unsigned long long)loops);

I get the same behavior. The following results are for a program with casts added to the printf call.

Using gcc 5.2.0 on my 64-bit system, I get the correct output with -O0, -O1, -O2, and -O3, with or without -m32. The timing indicates that gcc does not eliminate the loop at any optimization level.

Using clang 3.4 on the same system, I get the correct output with -O0 or -O1, but incorrect output (0 bits set) at -O2 or -O3. Timing indicates that the loop is eliminated at -O2 and -O3. When I compile with clang -m32, the output is correct (and the loop is not eliminated) at all optimization levels.

When I change the declaration of loops to

volatile uint64_t loops = 0;

I get correct output at all optimization levels (and the loop is not eliminated).

A further tweak to the program (not shown here) shows that vec[31415927] really is set to 0xb9fe2f2fedf7ebbd, even when optimization produces the wrong bit count.

16
votes

It is a bug in pre-3.6.0 clang. (The "3.6.0svn" version precedes 3.6.0.) Since it has already been fixed in the 3.6.0 release as of five months ago, I have reported the bug to Apple -- this is still their most recent distribution of compiler tools.

6
votes

It does look like a bug in clang. I can reproduce this in my 64-bit system running clang3.4-1ubuntu3; as the other answer mentions, I always get correct output with gcc (which never optimizes away the loop), but clang seems to optimize away the loop if we use -O2 and -O3.

This answer doesn't add much to Keith's thorough and outstanding answer, but for future reference I want to show a possible workaround (other than volatile).

Indeed, making either of x, count or loops volatile will fix it, but after some experimenting, I determined that the bug appears to manifest itself only on do { ... } while; loops.

If you change the code to use a while or a for loop (and make the appropriate changes to maintain the program's behavior), clang will always produce the correct output and the loop is not optimized away (but it still runs faster with -O3).

Here's an example:

#include <stdio.h>
#include <inttypes.h>

/* bit vector of 1<<32 bits, initialized to all zeros */
static uint64_t vec[1 << 26] = {0};

int main(void)
{
    /* set 47 of the bits. */
    vec[31415927] = UINT64_C(0xb9fe2f2fedf7ebbd);

    /* count the set bits */
    uint64_t count = vec[0] & (uint64_t)1;
    uint64_t loops = 1;
    uint32_t x = 1;

    while (x) {
        if (vec[x >> 6] & ((uint64_t)1 << (x & 0x3f)))
            count++;
        x++;
        loops++;
    }

    printf("%" PRIu64 " bits set (%" PRIu64 " loops)\n", count, loops);
    return 0;
}