5
votes

I am decoding a binary file, which has decimal numbers represented by four bytes, little endian. For example, 94 53 F0 40 represents 7.510202. Unfortunately, Python is giving me 7.51020240784.

When I try to parse this data using unpack("<f",sampledata)[0] I don't get exact representations of the original, due to the way Python stores values (for more information, see http://bugs.python.org/issue4114).

Unfortunately, I do need to get the exact same representation- regardless of discussions about the inaccuray of floats, because I need to write these values to a text file, with the same number of decimal places as they were initially written to the binary file with.

I'd rather stick to Python if possible, but am happy to implement a solution in C if necessary. The reason I cannot simply truncate the return of the unpack function, is that I cannot guarantee how many decimal places the original float had, for example 0C 02 0F 41 represents 8.938 according to my hex editor, from the original binary file, which only has 3 decimal places.

To be clear, I need to take four hex bytes as my input, and output either a text/ASCII or number representation of the IEEE 32-bit floating point number, that has the same number of decimal places as was intended by the creator of the file. The output I will use to create a CSV of the original binary data file, not for actually performing any calculations.

Any suggestions?

Example:

from __future__ import print_function
from struct import *

print("Should print 7.510202")

hexbytes = b"\x94\x53\xF0\x40"

# 01101001 11001000 11110001 01000000
# should print 7.510202

print(unpack("<f",hexbytes)[0])
3
No, 94 53 F0 40 does not represent 7.510202; that value isn't exactly representable in either single- or double-precision IEEE 754 binary floating-point. It represents the number 7.5102024078369140625 (exactly). If you need more information from the original text file (like the number of decimal places originally used to describe the number), you're going to have to track that information separately. By the time you pass to just a single-precision float, that information is gone.Mark Dickinson
Well, both the correctly decoded file, AND my hex editor give me 7.510202. I don't care how they got there, but I wish to do the same. See imgur.com/Tmx3bD9Chris
Then your hex editor is probably just giving an output based on 7 significant digits. To get the same result in Python, do "%.7g" % x.Mark Dickinson
You are burying your head in the sand. That number is not representable in a binary floating point type. Futile to pretend otherwise. There is no bug afflicting you. Everything is behaving as designed. If you want to round to 6dp, do so.David Heffernan
@DavidHeffernan rounding to an arbitrary number of decimal places isn't as trivial as it seems. There's a round function but it only works if you already know how many decimals to the right are required, and for arbitrary input that can change drastically.Mark Ransom

3 Answers

5
votes

A 4-byte IEEE format floating point number holds approximately 7 digits. What you want to do is round the result of unpack to a total of 7 digits. From there the normal Python conversion from float to string will hide all the floating point nastiness from you.

def magnitude(x):
    return 0 if x==0 else int(math.floor(math.log10(abs(x)))) + 1

def round_total_digits(x, digits=7):
    return round(x, digits - magnitude(x))

>>> round_total_digits(struct.unpack('<f', '\x94\x53\xF0\x40')[0])
7.510202
>>> round_total_digits(struct.unpack('<f', '\x0C\x02\x0F\x41')[0])
8.938
>>> x = struct.unpack('<f', struct.pack('<f', 12345.67))[0]
>>> x
12345.669921875
>>> round_total_digits(x)
12345.67

Note that if your numbers did not originate from a direct conversion of a decimal number but were the result of a calculation, this could reduce the total accuracy. But not by much.

1
votes
  uint32_t b = 0x40F05394 + printf("");

  printf("%.11f\n", *(float *) &b);

prints in my (little endian) system:

7.51020240784

so you need to print more digits with f conversion specifier. Same with python you can just request the number of digits to be printed.

Example:

print "%.11f" % (unpack("<f",hexbytes)[0])

If the number of digits to be printed is variable in your text file, you have to also store this information in your text file.

Then in C you can print it:

      int p = 11;
      printf("%.*f\n", p, *(float *) &b);  // 11 here can be a variable

In Python:

     p = 11
     print "%.*f" % (p, (unpack("<f",hexbytes)[0]))  # 11 can be a variable

Of course to have 0x40F05394 from 0x9453F040, you just need to re-arrange the order of the bytes.

0
votes

Here's an example of how to encode and decode in little endian. This doesn't address any rounding issues, but looks like those were worked out in the answer above.

import csv, os
import struct

test_floats = [1.2, 0.377, 4.001, 5, -3.4]

## write test floats to a new csv file:
path_test_csv = os.path.abspath('data-test/test.csv')
print path_test_csv
test_csv = open(path_test_csv, 'w')
wr = csv.writer(test_csv)
for x in test_floats:
    wr.writerow([x])
test_csv.close()


## write test floats as binary
path_test_binary = os.path.abspath('data-test/test.binary')
test_binary = open(path_test_binary, 'w')
for x in test_floats:
    binary_data = struct.pack('<f', x)
    test_binary.write(binary_data)
test_binary.close()


## read in test binary
binary = open(path_test_binary, 'rb')
binary.seek(0,2) ## seeks to the end of the file (needed for getting number of bytes)
num_bytes = binary.tell() ## how many bytes are in this file is stored as num_bytes
# print num_bytes
binary.seek(0) ## seeks back to beginning of file
i = 0 ## index of bytes we are on
while i < num_bytes:
    binary_data = binary.read(4) ## reads in 4 bytes = 8 hex characters = 32-bits
    i += 4 ## we seeked ahead 4 bytes by reading them, so now increment index i
    unpacked = struct.unpack("<f", binary_data) ## <f denotes little endian float encoding
    print tuple(unpacked)[0]