You have two different conversion routines for converting the integer and fractional parts to binary. You understand how to convert 1864
to binary, but you have problems converting .78
to binary. Note: you must convert the actual fraction held in memory for the float 1864.78
which is 1864.780029
or fraction 0.780029
not 0.78
. That appears where your "rounding" confusion is coming from.
To convert a fraction to its binary representation, you will multiply the fraction by 2
and if the resulting number has an integer part greater than 1
, your binary representation of that bit is 1
, if not your representation is 0
. If greater than one, you subtract 1
from the number and repeat until you have exhausted the number or reached the limit of precision in question. For example:
number : 1864.78
float : 1864.780029 (actual nearest representation in memory)
integer : 1864
fraction : 0.780029
2 * 0.780029 = 1.560059 => integer part (1) fraction (0.560059) => '1'
2 * 0.560059 = 1.120117 => integer part (1) fraction (0.120117) => '1'
2 * 0.120117 = 0.240234 => integer part (0) fraction (0.240234) => '0'
2 * 0.240234 = 0.480469 => integer part (0) fraction (0.480469) => '0'
2 * 0.480469 = 0.960938 => integer part (0) fraction (0.960938) => '0'
2 * 0.960938 = 1.921875 => integer part (1) fraction (0.921875) => '1'
2 * 0.921875 = 1.843750 => integer part (1) fraction (0.843750) => '1'
2 * 0.843750 = 1.687500 => integer part (1) fraction (0.687500) => '1'
2 * 0.687500 = 1.375000 => integer part (1) fraction (0.375000) => '1'
2 * 0.375000 = 0.750000 => integer part (0) fraction (0.750000) => '0'
2 * 0.750000 = 1.500000 => integer part (1) fraction (0.500000) => '1'
2 * 0.500000 = 1.000000 => integer part (1) fraction (0.000000) => '1'
note: how the floating-point fractional value will tend to zero rather than reaching your limit of digits. If you attempt to convert 0.78
(which is not capable of exact representation as the fraction to 1864.78
in a 32-bit floating point value) you will reach a different conversion in the 12th bit.
Once you have converted your fractional part to binary, you can continue with conversion into IEEE-754 single precision format. e.g.:
decimal : 11101001000
fraction : 110001111011
sign bit : 0
The normalization for the biased exponent is:
11101001000.110001111011 => 1.1101001000110001111011
exponent bias: 10
unbiased exponent: 127
__________________+____
biased exponent: 137
binary exponent: 10001001
Conversion to 'hidden bit' format to form mantissa:
1.1101001000110001111011 => 1101001000110001111011
Then use the sign bit + excess 127 exponent + mantissa to form the IEEE-754 single precision representation:
IEEE-754 Single Precision Floating Point Representation
0 1 0 0 0 1 0 0 1 1 1 0 1 0 0 1 0 0 0 1 1 0 0 0 1 1 1 1 0 1 1 0
|- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -|
|s| exp | mantissa |
Look it over and let me know if you have further questions. If you wanted a simple routine to fill a character array with the resulting conversion, you could do something similar to the following to convert a floating point fraction part to binary:
#define MANTISSA 23
...
/** return string containing binary representation of fraction
* The function takes a float as an argument and computes the
* binary representation of the fractional part of the float,
* On success, the function returns a null-terminated string
* containing the binary value, or NULL otherwise. The conversion
* is limited to the length of your MANTISSA (23-bits for single
* precission, 52-bits for double precision). You must insure
* you provide a buffer for 's' of at least MANTISSA + 1 bytes.
*/
char *fpfrc2bin (char *s, float fvalue)
{
/* obtain fractional value from fvalue */
float fv = fvalue > 1.0 ? fvalue - (int)fvalue : fvalue;
char *p = s;
unsigned char it = 0;
while (fv > 0 && it < MANTISSA + 1)
{ /* convert fraction */
fv = fv * 2.0;
*p++ = ((int)fv) ? '1' : '0';
*p = 0; /* nul-terminate */
fv = ((int)fv >= 1) ? fv - 1.0 : fv;
it++;
}
return s;
}