60
votes

I know that the behavior of >> on signed integer can be implementation dependent (specifically, when the left operand is negative).

What about the others: ~, >>, &, ^, |? When their operands are signed integers of built-in type (short, int, long, long long), are the results guaranteed to be the same (in terms of bit content) as if their type is unsigned?

4
There is nothing confusing about ~, & , ^, | since they operate on bit level. Signed/unsigned doesn't matter. >> is usually implemented arithmetic right shift (correctly / 2 number) if signed, and logical right shift (fill with zero) if unsigned. << also doesn't have any confusion, since we will just remove the bit on the left (and fill with zero on the right).nhahtdh
Did you mean << rather than >> in your list of "other" operators?Toby Speight
removed C++ tag - C and C++ are different languages and questions shouldn't be dual tagged unless they're about interoperability or something; the existing answers all address C only.M.M

4 Answers

52
votes

For negative operands, << has undefined behavior and the result of >> is implementation-defined (usually as "arithmetic" right shift). << and >> are conceptually not bitwise operators. They're arithmetic operators equivalent to multiplication or division by the appropriate power of two for the operands on which they're well-defined.

As for the genuine bitwise operators ^, ~, |, and &, they operate on the bit representation of the value in the (possibly promoted) type of the operand. Their results are well-defined for each possible choice of signed representation (twos complement, ones complement, or sign-magnitude) but in the latter two cases it's possible that the result will be a trap representation if the implementation treats the "negative zero" representation as a trap. Personally, I almost always use unsigned expressions with bitwise operators so that the result is 100% well-defined in terms of values rather than representations.

Finally, note that this answer as written may only apply to C. C and C++ are very different languages and while I don't know C++ well, I understand it may differ in some of these areas from C...

12
votes
  • A left shift << of a negative value has undefined behaviour;
  • A right shift >> of a negative value gives an implementation-defined result;
  • The result of the &, | and ^ operators is defined in terms of the bitwise representation of the values. Three possibilities are allowed for the representation of negative numbers in C: two's complement, ones' complement and sign-magnitude. The method used by the implementation will determine the numerical result when these operators are used on negative values.

Note that the value with sign bit 1 and all value bits zero (for two's complement and sign-magnitude), or with sign bit and all value bits 1 (for ones’ complement) is explicitly allowed to be a trap representation, and in this case if you use arguments to these operators that would generate such a value the behaviour is undefined.

4
votes

The bit content will be the same, but the resulting values will still be implementation dependent.

You really shouldn't see the values as signed or unsigned when using bitwise operations, because that is working on a different level.

Using unsigned types saves you from some of this trouble.

2
votes

The C89 Standard defined the behavior of left-shifting signed numbers based upon bit positions. If neither signed nor unsigned types have padding bits, the required behavior for unsigned types, combined with the requirement that positive signed types share the same representation as unsigned types, would imply that the sign bit is immediately to the left of the most significant value bit.

This, in C89, -1<<1 would be -2 on two's-complement implementations which don't have padding bits and -3 on ones'-complement implementations which don't have padding bits. If there are any sign-magnitude implementations without padding bits, -1<<1 would equal 2 on those.

The C99 Standard changed left-shifts of negative values to Undefined Behavior, but nothing in the rationale gives any clue as to why (or even mentions the change at all). The behavior required by C89 may have been less than ideal in some ones'-complement implementations, and so it would made sense to allow those implementations the freedom to select something better. I've seen no evidence to suggest that the authors of the Standard didn't intended that quality two's-complement implementations should continue to provide the same behavior mandated by C89, but unfortunately they didn't actually say so.