6
votes

Suppose that we define:

short x = -1;
unsigned short y = (unsigned short) x;

According to the C99 standard:

Otherwise, if the new type is unsigned, the value is converted by repeatedly adding or subtracting one more than the maximum value that can be represented in the new type until the value is in the range of the new type. (ISO/IEC 9899:1999 6.3.1.3/2)

So, assuming two bytes for short and a two's complement representation, the bit patterns of these two integers are:

x = 1111 1111 1111 1111 (value of -1),
y = 1111 1111 1111 1111 (value of 65535).

Since -1 is not in the value range for unsigned short, and the maximum value that can be represented in an unsigned short is 65535, 65536 is added to -1 to get 65535, which is in the range of unsigned short. Thus the bits remain unchanged in casting from int to unsigned, though the represented value is changed.

But, the standard also says that representations may be two's complement, one's complement, or sign and magnitude. "Which of these applies is implementation-defined,...." (ISO/IEC 9899:1999 6.2.6.2/2)

On a system using one's complement, x would be represented as 1111 1111 1111 1110 before casting, and on a system using sign and magnitude representation, x would be represented as 1000 0000 0000 0001. Both of these bit patterns represent a value of -1, which is not in the value range of unsigned short, so 65536 would be added to -1 in each case to bring the values into range. After the cast, both of these bit patterns would be 1111 1111 1111 1111.

So, preservation of the bit pattern in casting from int to unsigned int is implementation dependent.

It seems like the ability to cast an int to unsigned int while preserving the bit pattern would be a handy tool for doing bit-shifting operations on negative numbers, and I have seen it advocated as a technique for just that. But this technique does not appear to be guaranteed to work by the standard.

Am I reading the standard correctly here, or am I misunderstanding something about the details of the conversion from signed to unsigned types? Are two's complement implementations prevalent enough that the assumption of bit-pattern preservation under casting from int to unsigned is reasonable? If not, is there a better way to preserve bit patterns under a conversion from int to unsigned int?

Edit

My original goal was to find a way to cast an int to unsigned int in such a way that the bit pattern is preserved. I was thinking that a cast from int to intN_t could help accomplish this:

unsigned short y = (unsigned short)(int16_t) x;

but of course this idea was wrong! At best this would only enforce two's complement representation before casting to unsigned, so that the final bit pattern would be two's complement. I am tempted to just delete the question, yet I am still interested in ways to cast from int to unsigned int that preserve bit patterns, and @Leushenko has provided a really neat solution to this problem using unions. But, I have changed the title of the question to reflect the original intention, and I have edited the closing questions.

2
I find the idea of doing bit shifting operations on negative numbers deeply unsettling if you don't care about the relationship between the numerical value and the bit pattern. Why aren't you using unsigned types from the beginning if you're working with bit patterns?user1084944
@Hurkyl-- Initially I was trying to find the Hamming weight of a binary representation of an int. Some methods to do this involve bit-shifting the int to the right. It was suggested to me that a cast to unsigned would make this possible, and that sounded too easy. So, here we are. My solution to the Hamming weight problem did not involve bit-shifting a negative int, but this seemed like an interesting corner to investigate.ad absurdum

2 Answers

6
votes

If you specifically want to preserve bit-patterns above anything else, this seems like an excellent use case for going through a union rather than cast operators:

union S2US { short from; unsigned short to; };

...
short value = ...
unsigned short bits = (union S2US){ .from = value }.to;
...

As explained in footnote 95 (under section 6.5.2.3), "If the member used to read the contents of a union object is not the same as the member last used to store a value in the object, the appropriate part of the object representation of the value is reinterpreted as an object representation in the new type as described in 6.2.6". Reinterpretation does not involve manipulating the data in any way, so depending on the member types, the extracted value is not guaranteed to have any direct arithmetic relationship to the inserted one, but it is guaranteed to have the exact same memory representation.

Since the sizes of the signed and unsigned versions of an integer type are the same (6.2.5 p6), and all members of a union must begin their storage at the same location (6.7.2.1 p16), a union that only contains signed and unsigned integers of the same width must copy all of the bits faithfully from one to the other, in either direction.

2
votes

Is it acceptable to cast from (int) to (unsigned)(intN_t) to preserve bit patterns?

Often, yes, but not specified to do so in C for all values. C tries to maintain values during type conversions, not bit patterns.

Should a value, say x, be representable as int, unsigned and the chosen intN_t, then that value's bit pattern does not change. So the question relates to when x is negative.

C specifies that conversion to any unsigned type cope with overflow by add/subtracting the "maximum value of the unsigned type + 1" until the result is in range. Should the signed type use 2's complement encoding, the pattern of lower significant bits will match the target unsigned type.

Conversion to signed integer types in implementation defined behavior - hence OP's dilemma.

C only specifies that the range of a signed integer type and its corresponding unsigned integer type both need to encode the rage: 0 to the signed type maximum. IOWs INT_MAX == UINT_MAX is allowed. On such rare platforms, converting from int to unsigned to int loses the sign.


If code needs to preserve some signed type's bit pattern, a union with an array of unsigned char works in all cases.

union uA {
  some_signed_int_type i;
  unsigned char uc[sizeof (some_signed_int_type)];
}

A union with a fixed-width unsigned type (these types are optional) whose maximum is greater than the signed maximum works to maintain the bit pattern. Do not rely on the value being the same, even for positive values. Fixed width types do not have padding, not so the general signed types.

assert(uintN_MAX > some_signed_type_max);
union uB {
  some_signed_int_type i;
  uintN_t u;
}

The central benefit of unsigned char and (u)intN_t is that these types are specified to not have padding.