1
votes

I'm having a little difficult understanding how to work with bits in C, with regards to two's complement.

This is part of a homework assignment, but I'm not looking for a code answer, but rather to understand what's going on with two's complement representation.

My task is to restrict a number to a certain number of bits (n), and to determine whether or not a given number can be represented in two's complement in n number of bits.

According to examples, 5 CANNOT be represented as a 3-bit integer, while -4 CAN be represented as a 3-bit integer.

Why is this?


edit: I worked out a detailed explanation of my thought process, but realized I was completely off so decided to omit it.

My original reasoning was to see if it makes sense to allow 5 and -5, 4 and -4 to be represented in 3-bits. But that doesn't make sense because that doesn't really answer the problem.

I understand how 5 and -4 is represented as two's complement. Ex, as 4-bits:

5: 0101

-4: 1100


second edit:

For clarification, decided to add my original reasoning:

5 is 0101, and -5 is 1011.
I can see how 5 cannot be represented in two's complement when restricted to 3 bits, because without that 4th bit, we cannot indicate that -5 is a negative number. We need that extra 1 in 1011. If we could only have up to 3 bits, we'd have 011, and there'd be no way to differentiate -5 from 3, the latter of which is 0011 in 4 bits, and 011 in 3 bits. Is this reasoning correct?

4 is 0100, and -4 is 1100.
Here I'm confused. I don't see why -4 can be represented in 3-bits as a two's complement integer.

4 is represented as 0100, and 100 with 3 bits. -4 is, if we start with 4 (100), we flip 100 (011), and add 1 (100), we are left with 100 again (in 3 bits). In 4 bits, I believe this is represented as 1100.

My confusion is, don't we need an extra 1, for 1100, to differentiate -4 from 4, which is 0100? If all we have is 3 bits, how can we differentiate 100 from 100?

1
Consider this: if the leftmost bit signifies a negative or positive number, can you safely remove the first 0 from "5 / 0101"? Can you safely remove the first 1 from "4 / 1100"?Jongware
What's the biggest number you can represent with n digits in base 10? And can you answer the question for sign magnitude? Hint: With two's complement, you can represent one negative number more (and no negative 0).mafso
@Jongware I decided to add my old reasoning under a second edit in the original post. I thought about what you said earlier, but could not come to any conclusion.Spag Guy
"If all we have is 3 bits ..", isn't that the same situation you find yourself in with C? ("If all we have is 32 bits..") Think about how C addresses this.Jongware
@Jongware Hmm ok. Well in C, our max representation for 32 bits in two's complement is 01111111 11111111 11111111 11111111. Any higher and you get -1. I'm not sure how to answer your question, my understanding is that you get an overflow because a positive + a positive cannot equal a negative.Spag Guy

1 Answers

5
votes

In order to understand this it helps to remember that, despite its ubiquity on modern hardware, two's-complement is not a law of nature but a human construct.

Say we want to pack an integer into n bits. Let n=3 for simplicity and brevity. Clearly we can choose any 23=8 integers we please, providing we're consistent, but some choices make life easier than others when it comes to arithmetic.

Unsigned integers are straightfoward. There's a natural mapping:

Encoding  Value
 000        0
 001        1
 010        2
 011        3
 100        4
 101        5
 110        6
 111        7

This is just the binary encoding of the value, with zero padding. Zero has all bits zero, which is handy. Addition and subtraction just work, modulo 2n.

Signed integers are trickier. Before the universal adoption of twos' complement, at least two other representations were also used: signed magnitude (SM) and ones' complement (1C).

Encoding   SM    1C
 000        0     0 
 001        1     1
 010        2     2
 011        3     3
 100       -0    -3
 101       -1    -2
 110       -2    -1
 111       -3    -0

Both encodings, SM and 1C, use n bits to store signed integers from -(2n-1-1) to 2n-1-1. Both encodings store positive integers the same way as before. Good: both go the same distance in the positive and negative direction. Good: both make testing for negativeness easy (apart from zero, which often needs special handling anyway). Good: both make negating a number easy (SM: flip the sign bit; 1C: flip all the bits). Bad: both include an unnecessary "negative zero", which complicates testing for equality, as well as arithmetic.

Which brings us on to two's complement (2C).

Encoding   2C
 000        0
 001        1
 010        2
 011        3
 100       ???
 101       -3
 110       -2
 111       -1

Here we start with the positive integers as before, and get to the negative integers by "counting backwards" from zero until we meet in the middle. Good: this makes arithmetic very natural, as easy as in the unsigned case. Bad: negating is trickier than in SM or 1C.

But what do we do with the row with the ???'s? We already have all the numbers we had in 1C and SM. We could choose 4, or -4, or maybe "undefined". The convention in 2C is that we choose the negative value. This gives us a range of -4 to +3, or more generally -2n-1 to 2n-1-1. This asymmetric range is awkward (we have -4 but not +4, and it turns out -4 is its own negative![1]), but preserves the property that all negative numbers have the top bit set, as well as ensuring that every encoding has a value associated with it.

So (finally!) on to your question. Why can -4 be represented in two's complement as a 3-bit integer? Because that's the least bad option.

Further reading: signed number representations on wikipedia.


[1] Here's a test program to demonstrate how much of a mess this is. Here int is 32 bits wide.

[~]% cat 2c.c
#include <stdio.h>
#include <limits.h>

int main(void) {
    int i = INT_MAX;
    int j = INT_MIN;

    printf ("   i = % d (hex %x)\n", i, i);
    printf ("   j = % d (hex %x)\n", j, j);
    printf ("  -i = % d (hex %x)\n", -i, -i);
    printf ("  -j = % d (hex %x)\n", -j, -j);
    printf ("i*-1 = % d (hex %x)\n", i*-1, i*-1);
    printf ("j*-1 = % d (hex %x)\n", j*-1, j*-1);
    printf ("i/-1 = % d (hex %x)\n", i/-1, i/-1);
    printf ("j/-1 = % d (hex %x)\n", j/-1, j/-1);

    return 0;
}

Note that negating the most negative integer (INT_MIN) is undefined behaviour per the C standard, so clang is perfectly justified in having one out of three methods of negation kill the program:

[~]% clang -Wall 2c.c -o 2c && ./2c
   i =  2147483647 (hex 7fffffff)
   j = -2147483648 (hex 80000000)
  -i = -2147483647 (hex 80000001)
  -j = -2147483648 (hex 80000000)
i*-1 = -2147483647 (hex 80000001)
j*-1 = -2147483648 (hex 80000000)
i/-1 = -2147483647 (hex 80000001)
zsh: floating point exception  ./2c

This has been a source of security vulnerabilities (sadly most of the real ones don't get fun walkthroughs).