0
votes

I've been trying to convert the value of the S&P 500 which today is 1864.78 to how it would be represented in IEEE single precision format in memory.

Converting the left of the decimal (1864) is easy.

11101001000.

But how do I get the binary representation of the decimal (.78)? I tried using the technique but it produces many numbers over the 8 bit exponent IEEE format:

.78*2=1.56 1

.56*2=1.12 1

.12*2=.24 0

.24*2=.48 0

.48*2=.96 0

.96*2=1.92 1

.92*2=1.84 1

.84*2=1.68 1

.68*2=1.36 1

.36*2=.72 0

.72*2=1.44 1

.44*2=.88 1 (rounded up because now we have 23 total bits)

11101001000.110001111011 = 23 bits for mantissa

Add 0 for sign

0 11101001000.110001111011

Now I need to move the decimal over 10 places

1.1101001000110001111011 x 2^10 exponent is 10 now

add a 0 bit to make full mantissa 23 bits

1.11010010001100011110110

exponent is 10 so 10 + 127 = 137

which is equal to 10001001

so 0 10001001 11010010001100011110110 which is a 32 bit number.

Does this look like a decent approach? I tested the value and writing this question I was actually able to work through it on my own.

Testing the decimal FP with this. http://www.h-schmidt.net/FloatConverter/IEEE754.html

2
In general, just stop making new digits when you reach the precision limit, although IEEE 754 has rounding rules to follow, too. Many, many decimal parts will not be precisely representable in binary in this way, so you have to approximate.Crowman
Thanks! I was able to answer the question actually because making this post I had to write my thoughts down step-by-step. The video here really helped if anyone else has an issue walking through it. youtube.com/watch?v=MIrQtuoT5Akuser2400026

2 Answers

7
votes

You have two different conversion routines for converting the integer and fractional parts to binary. You understand how to convert 1864 to binary, but you have problems converting .78 to binary. Note: you must convert the actual fraction held in memory for the float 1864.78 which is 1864.780029 or fraction 0.780029 not 0.78. That appears where your "rounding" confusion is coming from.

To convert a fraction to its binary representation, you will multiply the fraction by 2 and if the resulting number has an integer part greater than 1, your binary representation of that bit is 1, if not your representation is 0. If greater than one, you subtract 1 from the number and repeat until you have exhausted the number or reached the limit of precision in question. For example:

number   : 1864.78
float    : 1864.780029  (actual nearest representation in memory)
integer  : 1864
fraction : 0.780029

 2 * 0.780029 = 1.560059  =>  integer part (1) fraction (0.560059)  =>  '1'
 2 * 0.560059 = 1.120117  =>  integer part (1) fraction (0.120117)  =>  '1'
 2 * 0.120117 = 0.240234  =>  integer part (0) fraction (0.240234)  =>  '0'
 2 * 0.240234 = 0.480469  =>  integer part (0) fraction (0.480469)  =>  '0'
 2 * 0.480469 = 0.960938  =>  integer part (0) fraction (0.960938)  =>  '0'
 2 * 0.960938 = 1.921875  =>  integer part (1) fraction (0.921875)  =>  '1'
 2 * 0.921875 = 1.843750  =>  integer part (1) fraction (0.843750)  =>  '1'
 2 * 0.843750 = 1.687500  =>  integer part (1) fraction (0.687500)  =>  '1'
 2 * 0.687500 = 1.375000  =>  integer part (1) fraction (0.375000)  =>  '1'
 2 * 0.375000 = 0.750000  =>  integer part (0) fraction (0.750000)  =>  '0'
 2 * 0.750000 = 1.500000  =>  integer part (1) fraction (0.500000)  =>  '1'
 2 * 0.500000 = 1.000000  =>  integer part (1) fraction (0.000000)  =>  '1'

note: how the floating-point fractional value will tend to zero rather than reaching your limit of digits. If you attempt to convert 0.78 (which is not capable of exact representation as the fraction to 1864.78 in a 32-bit floating point value) you will reach a different conversion in the 12th bit.

Once you have converted your fractional part to binary, you can continue with conversion into IEEE-754 single precision format. e.g.:

decimal  : 11101001000
fraction : 110001111011
sign bit : 0

The normalization for the biased exponent is:

 11101001000.110001111011  =>  1.1101001000110001111011

     exponent bias: 10
 unbiased exponent: 127
 __________________+____

   biased exponent: 137
   binary exponent: 10001001

Conversion to 'hidden bit' format to form mantissa:

1.1101001000110001111011  =>  1101001000110001111011

Then use the sign bit + excess 127 exponent + mantissa to form the IEEE-754 single precision representation:

IEEE-754 Single Precision Floating Point Representation

  0 1 0 0 0 1 0 0 1 1 1 0 1 0 0 1 0 0 0 1 1 0 0 0 1 1 1 1 0 1 1 0
 |- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -|
 |s|      exp      |                  mantissa                   |

Look it over and let me know if you have further questions. If you wanted a simple routine to fill a character array with the resulting conversion, you could do something similar to the following to convert a floating point fraction part to binary:

#define MANTISSA 23
...

/** return string containing binary representation of fraction
 *  The function takes a float as an argument and computes the
 *  binary representation of the fractional part of the float,
 *  On success, the function returns a null-terminated string
 *  containing the binary value, or NULL otherwise. The conversion
 *  is limited to the length of your MANTISSA (23-bits for single
 *  precission, 52-bits for double precision). You must insure
 *  you provide a buffer for 's' of at least MANTISSA + 1 bytes.
 */
char *fpfrc2bin (char *s, float fvalue)
{
    /* obtain fractional value from fvalue */
    float fv = fvalue > 1.0 ? fvalue - (int)fvalue : fvalue;
    char *p = s;
    unsigned char it = 0;

    while (fv > 0 && it < MANTISSA + 1)
    {   /* convert fraction */
        fv = fv * 2.0;
        *p++ = ((int)fv) ? '1' : '0';
        *p = 0;  /* nul-terminate */
        fv = ((int)fv >= 1) ? fv - 1.0 : fv;
        it++;
    }

    return s;
}
1
votes

You're 1 bit too short: the IEEE754 binary32 format uses a 24-bit significand, but stored using 23 bits with an implicit leading 1.

So the last 2 bits are:

0.44*2=0.88 0           => 1
0.88*2=1.76 2 (rounded) => 0 (carry the extra bit)

which gives the number

1.110100100011000111101102 × 210

You've already calculated the biased exponent (137 = 100010012), so the resulting bit pattern can be constructed directly:

0 10001001 11010010001100011110110