0
votes

I am writing a generic routine for converting fixed-point numbers between decimal and binary representations.

For positive numbers the processing is simple, however when things come to negative ones I found divergent sources. Someone says there is a single bit used to hold the sign while others say the whole number should be represented in a pseudo integer using 2's complement even it is negative.

Please anyone tell me which source is correct or is there a standard representation for signed fixed point numbers?

Additionally, if the 2's complement representation was correct then how to represent negative numbers with zero integer part. For example -0.125?

2

2 Answers

2
votes

Fixed-point numbers are just binary values where the place values have been changed. Assigning place values to the bits is an arbitrary human activity, and we can do it in any way that makes sense. Normally we talk about binary integers so it is convenient to assign the place value 2^0 = 1 to the LSB, 2^1=2 to the bit to the left of the LSB, and so on. For an N bit integer the place value of the MSB becomes 2^(N-1). If we want a two's-complement representation, we change the place value of the MSB to -2^(N-1) and all of the other bit place values are unchanged.

For fixed-point values, if we want F bits to represent a fractional part of the number, then the place value of the LSB becomes 2^(0-F) and the place value of the MSB becomes 2^(N-1-F) for unsigned numbers and -2^(N-1-F) for signed numbers.

So, how would we represent -0.125 in a two's-complement fixed-point value? That is equal to 0.875 - 1, so we can use a representation where the place value of the MSB is -1 and the value of all of the other bits adds up to 0.875. If you choose a 4-bit fixed-point number with 3 fraction bits you would say that 1111 binary equals -0.125 decimal. Adding up the place values of the bits we have (-1) + 0.5 + 0.25 + 0.125 = -0.125. My personal preference is to write the binary number as 1.111 to note which bits are fraction and which are integer.

The reason we use this approach is that the normal integer arithmetic operators still work.

1
votes

It's easiest to think of fixed-point numbers as scaled integers — rather than shifted integers. For a given fixed-point type, there is a fixed scale which is a power of two (or ten). To convert from the real value to the integer representation, multiply by that scale. To convert back again, simply divide. Then the issue of how negative values are represented becomes a detail of the integer type with which you are representing your number.

Please anyone tell me which source is correct...

Both are problematic.

Your first source is incorrect. The given example is not...

the same as 2's complement numbers.

In two’s complement, the MSB's (most significant bit's) weight is negated but the other bits still contribute positive values. Thus a two’s complement number with all bits set to 1 does not produce the minimum value.

Your second source could be a little misleading where it says...

shifting the bit pattern of a number to the right by 1 bit always divide the number by 2.

This statement brushes over the matter of underflow that occurs when the LSB (least significant bit) is set to 1, and the resultant rounding. Right-shifting commonly results in rounding towards negative infinity while division results in rounding towards zero (truncation). Both produce the same behavior for positive numbers: 3/2 == 1 and 3>>1 == 1. For negative numbers, they are contrary: -3/2 == -1 but -3>>1 == -2.

...is there a standard representation for signed fixed point numbers?

I don't think so. There are language-specific standards, e.g. ISO/IEC TR 18037 (draft). But the convention of scaling integers to approximate real numbers of predetermined range and resolution is well established. How the underlying integers are represented is another matter.

Additionally, if the 2's complement representation was correct then how to represent negative numbers with zero integer part. For example -0.125?

That depends on the format of your integer and your choice of radix. Assuming a 16-bit two’s complement number representing binary fixed-point values, the scaling factor is 2^15 which is 32,768. Multiply the value to store as an integer: -0.125*32768. == -4096 and divide to retrieve it: -4096/32768. == -0.125.