Unary minus and signed-to-unsigned conversion

11

votes

Is this always technically correct:

unsigned abs(int n)
{
    if (n >= 0) {
        return n;
    } else {
        return -n;
    }
}

It seems to me that here if -INT_MIN > INT_MAX, the "-n" expression could overflow when n == INT_MIN, since -INT_MIN is outside the bounds. But on my compiler this seems to work ok... is this an implementation detail or a behaviour that can be relied upon?

Longer version

A bit of context: I'm writing a C++ wrapper for the GMP integer type (mpz_t) and taking inspiration for the existing GMP C++ wrapper (called mpz_class). When handling addition of mpz_t with signed integers there is code like this:

static void eval(mpz_ptr z, signed long int l, mpz_srcptr w)
{
  if (l >= 0)
    mpz_add_ui(z, w, l);
  else
    mpz_sub_ui(z, w, -l);
}

In other words, if the signed integer is positive, add it using the routine of unsigned addition, if the signed integer is negative add it using the routine of unsigned subtraction. Both *_ui routines take unsigned long as last arguments. Is the expression

-l

at risk of overflowing?

c++c

There is one more negative twos-complement integer than positive, so yes, it can overflow. – President James K. Polk

10

votes

If you want to avoid the overflow, you should first cast n to an unsigned int and then apply the unary minus to it.

unsigned abs(int n) {
  if (n >= 0)
    return n;
  return -((unsigned)n);
}

In your original code the negation happens before the type conversion, so the behavior is undefined if n < -INT_MAX.

When negating an unsigned expression, there will never be overflow. Instead the result will be modulo 2^x, for the appropriate value of x.

3

votes

There is no such thing as an overflow of unsigned integers in C. Arithmetic for them is clearly defined as computation modulo their max+1, they may "wrap" but technically this is not considered overflow. So the conversion part of your code is fine, although in extreme cases you might encounter surprising results.

The only point where you could have overflow in your code is the - of a signed type. There is exactly one value for signed types that might not have a positive counterpart, the minimum value. In fact for that you'd have to do a special check, e.g for int

if (INT_MIN < -INT_MAX && n == INT_MIN ) /*do something special*/

2

votes

Most computers today use a two complement number scale, which means the negative part is one larger than the positive, for example from -128 to 127. That means if you can represent the positive number the negative number you can represent the negative number without worry.

0

votes

Maybe it could cope with the symmetrical range of 2's-complement numbers:

#include <limits.h>

unsigned int abs(int n){

  unsigned int m;

  if(n == INT_MIN)
    m = INT_MAX + 1UL;
  else if(n < 0)
    m = -n;
  else 
    m = n;

  return m;
}

0

votes

This should avoid undefined behavior and work with all representations of signed int (2's complement, 1's complement, sign and magnitude):

unsigned myabs(int v)
{
  return (v >= 0) ? (unsigned)v : (unsigned)-(v+1)+1;
}

Modern compilers are able to remove the redundant -1+1 and recognize the idiom for computing the absolute value of a signed integer.

Here's what gcc produces:

_myabs:
    movl    4(%esp), %eax
    cltd
    xorl    %edx, %eax
    subl    %edx, %eax
    ret

-1

votes

Yes, it will overflow, to itself.

#include <stdio.h>
#include <limits.h>
int main(int argc, char**argv) {
    int foo = INT_MIN;
    if (-foo == INT_MIN) printf("overflow\n");
    return 0;
}

prints "overflow"

However, this is merely typical behavior, not required by the standard. If you wish to play it safe, see the accepted answer for how.

-1

votes

Very good question, which exposes the differences between C89, C99 and C++. So this is some commentary on these Standards.

In C89, where n is an int:

(unsigned)n

is not well defined for all n: there's no restriction on the conversion of signed or unsigned int except that the representation of a non-negative signed int is identical to that of an unsigned int of the same value, provided that value is representable.

This was considered a defect, and in C99, unfortunately there is a faulty attempt to restrict the encoding to two's complement, one's complement, or signed magnitude with the same number of bits. Unfortunately the C committee didn't have much mathematical knowledge and completely botched the specification: on the one hand it is ill-formed due to circular definition and therefore non-normative, and on the other hand, if you excuse this fault, it is a gross overconstraint, which, for example, excludes a BCD representation (used in C on old IBM mainframes), and also allows the programmer to hack the value of an integer by fiddling bits of the representation (which is very bad).

C++ went to some trouble to provide a better specification, however it suffers the same circular definition fault.

Roughly speaking, the representation of a value v is an array of unsigned char with sizeof(v) elements. An unsigned char has a power of two number of elements, and is required to be big enough to ensure it faithfully encodes any aliased data structure. The number of bits in an unsigned char is well defined as the binary log of the number of values representable.

The number of bits of any unsigned value is similarly well defined if it has a power of two number of values from 0 to 2^n-1, by via the canonical positional encoding scheme.

Unfortunately, the committee wanted to ask if there were any "holes" in the representation. For example could you have a 31 bit integer on a x86 machine? I say unfortunately, because this is a badly formed question, and the answer is similarly improper.

The proper way to ask this question is to ask if the representation is full. It is not possible to talk about "the bits of a representation" for signed integers because the specification does not go from the representation to the values, it goes the other way. This may confuse a lot of programmers who incorrectly think a representation is a mapping from underlying bits to some value: a representation is a mapping from the values to the bits.

A representation is full if it is a surjection, that is, it is onto the whole range of the representation space. If the representation is full then there are no "holes", that is, unused bits. However that is not all. A representation of 255 values to an array of 8 bits cannot be full, yet there are no bits which are unused. There are no holes.

The problem is this: consider an unsigned int, then there are TWO distinct bitwise representations. There is the well defined array of log base 2 bits determined from the canonical encoding, and then there is the array of bits of the physical representation given by the aliasing of an array of unsigned char. Even if this representation is full there is no correspondence between the two kinds of bits.

We all know that the "high order bits" of the logical representation can be at one end of the physical representation on some machines and the other on other machines: it's called endian-ness. But in fact there's no reason the bits couldn't be permuted in any order at all, in fact there's no reason the bits should line up at all! Just consider adding 1 modulo the maximum value plus 1 as the representation to see this.

So now the problem is that for signed integers there is no canonical logical representation, rather there are several common ones: two's complement, for example. However as above this is unrelated to the physical representation. The C committee just couldn't understand that the correspondence between the values and physical representation cannot be specified by talking about bits. It must be specified entirely by talking about the properties of functions.

Because this was not done, the C99 standard contains non-normative gibberish and consequently all of the rules for behaviour of signed and unsigned integer conversions are non-normative gibberish as well.

Therefore it is not clear that

(unsigned)n

will actually produce the desired result for negative values.

Unary minus and signed-to-unsigned conversion

7 Answers