13
votes

Is left-shifting a negative int Undefined Behavior in C++11?

The relevant Standard passages here are from 5.8:

2/The value of E1 << E2 is E1 left-shifted E2 bit positions; vacated bits are zero-filled. If E1 has an unsigned type, the value of the result is E1 × 2E2, reduced modulo one more than the maximum value representable in the result type. Otherwise, if E1 has a signed type and non-negative value, and E1×2E2 is representable in the result type, then that is the resulting value; otherwise, the behavior is undefined.

The part that confuses me is:

Otherwise, if E1 has a signed type and non-negative value, and E1×2E2 is representable in the result type, then that is the resulting value; otherwise, the behavior is undefined.

Should this be interpreted to mean that left-shifting any negative number is UB? Or does it only mean if you LS a negative and the result doesn't fit in the result type, then it's UB?

Moreover, the preceding clause says:

1/The shift operators << and >> group left-to-right. shift-expression: additive-expression shift-expression << additive-expression shift-expression >> additive-expression

The operands shall be of integral or unscoped enumeration type and integral promotions are performed.

The type of the result is that of the promoted left operand. The behavior is undefined if the right operand is negative, or greater than or equal to the length in bits of the promoted left operand.

This makes it explicit that using a negative number for one of the operands is UB. If it were UB to use a negative for the other operand, I would expect that to be made clear here as well.

So, bottom line, is:

-1 << 1

Undefined Behavior?


@Angew provided a psudocode interpretation of the Standardese which succinctly expresses one possible (likely) valid interpretation. Others have questioned whether this question is really about the applicability of the language "behavior is undefined" versus our (StackOverflow's) use of the phrase "Undefined Behavior." This edit is to provide some more clarification on what I'm trying to ask.

@Angew's interpretation of the Standardese is:

if (typeof(E1) == unsigned integral)
  value = E1 * 2^E2 % blah blah;
else if (typeof(E1) == signed integral && E1 >= 0 && representable(E1 * 2^E2))
  value = E1 * 2^E2;
else
  value = undefined;

What this question really boils down to is this -- is the correct interpretation actually:

value = E1 left-shift-by (E2)

switch (typeof(E1))
{
case unsigned integral :
  value = E1 * 2^E2 % blah blah;
  break;

case signed integral :
  if (E1 >= 0)
  { 
    if (representable(E1 * 2^E2))
    {
      value = E1 * 2^E2;
    }
    else
    {
      value = undefined;
    }
  }
  break;
}

?

Sidenote, in looking at this in terms of psudocode makes it fairly clear in my mind that @Agnew's interpretation is the correct one.

3
I understand that the result value can be undefined because the representation of negative numbers is not explicitly set by the standard. But should this not mean that the result of the operation is undefined? I am not sure "behavior is undefined" equates exactly to the term "Undefined Behavior". I am more inclined to think the author meant the resulting value is undefined (but either way the language should be tightened). - Martin York
@LokiAstari There's already terminology in the standard for saying when the behavior is defined but no specific value is required; It will say the value is unspecified or indeterminate. I'd expect that language to be used if that were the intent. - bames53
As long as we're language-lawyering… Post C++11, p2 has been modified by CWG 1457 like so: open-std.org/jtc1/sc22/wg21/docs/cwg_defects.html#1457 . The impact of this tweak is that one can shift a 1 bit of a signed type into the sign bit, and post-C++11 it is now not undefined behavior. This was done so that one could (e.g.) 1 << 31 to get 0x80000000 for a 32 bit int without the gods of UB reigning down on your head. This was a relief for authors of std::numeric_limits<int>::min() which is often computed this way, and is also constexpr, meaning UB is caught at compile time. - Howard Hinnant
@HowardHinnant FYI, I reference you comment in my answer to this question. - Shafik Yaghmour
Thanks for the attribute! - Howard Hinnant

3 Answers

8
votes

Yes, I would say it's undefined. If we translate the standardese to pseudo-code:

if (typeof(E1) == unsigned integral)
  value = E1 * 2^E2 % blah blah;
else if (typeof(E1) == signed integral && E1 >= 0 && representable(E1 * 2^E2))
  value = E1 * 2^E2;
else
  value = undefined;

I'd say the reason why they're explicit about the right-hand operand and not about the left-hand one is that the paragrpah you quote (the one with the right-hand operand case) applies to both left and right shifts.

For the left-hand operand, the ruling differs. Left-shifting a negative is undefined, right-shifting it is implementation-defined.

5
votes

Should this be interpreted to mean that left-shifting any negative number is UB?

Yes, the behavior is undefined when given any negative number. The behavior is only defined when both of the following are true:

  • the number is non-negative
  • E1 × 2E2 is representable in the result type

That's literally what "if E1 has a signed type and non-negative value, and E1×2E2 is representable in the result type, then that is the resulting value; otherwise, the behavior is undefined," is saying:

if X and Y
  then Z
else U
1
votes

Answer as per the Question:

The question really is:

Can we equate the term "behavior is undefined" equates exactly to the term "Undefined Behavior".

As it is currently worded it means "Undefined Behavior."

Personal comment about the situation

But I am not convinced that is the authors intention.
If it is the authors intention, then we should probably have a note explaining why. But I am more inclined to believe the author meant that the result of that operation is undefined because the representation of negative numbers is not explicitly defined by the standard. If the representation of negative numbers is not explicitly defined for negatives, then moving the bits around would lead to an undefined value.

Either way, the wording (or explanation) needs to be tightened/expanded to make it less ambiguous.