3
votes

According to this interesting paper about undefined behavior optimization in c, the expression (x<<n)|(x>>32-n) "performs undefined behavior in C when n = 0". This stackoverflow discussion confirms that the behavior is undefined for negative integers, and discusses some other potential pitfalls with left-shifting values.

Consider the following code:

#include <stdio.h>
#include <stdint.h>

uint32_t rotl(uint32_t x, uint32_t n)
{
    return (x << n) | (x >> (32 - n));
}

int main()
{
    uint32_t y = rotl(10, 0);
    printf("%u\n", y);
    return 0;
}

Compile using the following parameters: -O3 -std=c11 -pedantic -Wall -Wextra

  • In gcc >5.1.0 the output of the program is 10.
  • In clang >3.7.0 the output is 4294967295.

Interestingly, this is still true when compiling with c++: gcc results, clang results.

Therefore, my questions are as follows:

  1. It is my understanding from the language in the standard that this should not invoke undefined / implementation defined behavior since both of the parameters are unsigned integers and none of the values are negative. Is this correct? If not, what is the relevant section of the standard for c11 and c++11?
  2. If the previous statement is true, which compiler is producing the correct output according to the c/c++ standard? Intuitively, left shifting by no digits should give you back the value, i.e. what gcc outputs.
  3. If the above is not the case, why are there no warnings that this code may invoke undefined behavior due to left-shift overflow?
5

5 Answers

7
votes

From [expr.shift], emphasis mine:

The behavior is undefined if the right operand is negative, or greater than or equal to the length in bits of the promoted left operand.

You are doing:

(x >> (32 - n))

with n == 0, so you're right-shifting a 32-bit number by 32. Hence, UB.

2
votes

Your n is 0, so performing x << 32 is an undefined behavior as shifting uint32_t 32 bits or more is undefined.

1
votes

If n is 0, 32-n is 32, and since x has 32 bits, x>>(32-n) is UB.

The issue in the linked SO post is different. This one has nothing to do with signedness.

1
votes

A part of the post not fully answered.

why are there no warnings that this code may invoke undefined behavior due to left-shift overflow?

Looking at the add() code, what should the compiler warn about? Is it UB if the sum is outside the range of INT_MIN ... INT_MAX. Because the following code does not take precautions to prevent overflow, like here, should it warn? Should you think so, then so much code would be waning about potential this and that, that programmers would quickly turn that warning off.

int add(int a, int b) {
  return a + b;
}

The situation is not much different here. If n > 0 && n < 32, there is no problem.

uint32_t rotl(uint32_t x, uint32_t n) {
  return (x << n) | (x >> (32 - n));
}

C creates fast code primarily because it lacks lots of run-time error checking and compliers are able to perform very nice optimized code. If one needs lots of run-time checks, there are other languages suitable for those programmers.

C is coding without a net.

1
votes

When the C Standard was written, some implementations would behave weirdly when trying to perform a shift by extremely large or negative amounts, e.g. left-shifting by -1 might tie up a CPU with interrupts disabled while its microcode shifts a value four billion times, and disabling interrupts for that long might cause other system faults. Further, while few if any implementations would do anything particularly weird when shifting by exactly the word size, implementations weren't consistent about the value returned. Some would treat it as a shift by zero, while others would yield the same result as shifting by one, word-size times, and some would sometimes do one and sometimes the other.

If the authors of the Standard had specified that shifting by precisely the word size may select in Unspecified fashion between those two possible behaviors, that would have been useful, but the authors of the Standard weren't interested in specifying all the things that compilers would naturally do with or without a mandate. I don't think they considered the idea that implementations for commonplace platforms wouldn't naturally yield the commonplace behavior for expressions like the "rotate" given above, and didn't want to clutter the Standard with such details.

Today, however, some compiler writers think it's more important to exploit all forms of UB for "optimization" than to support useful natural behaviors which had previously been supported by essentially all commonplace implementations. Whether or not making the "rotate" expression malfunction when y==0 would allow a compiler to generate a useful program which is smaller than would otherwise be possible is irrelevant.