1
votes

I have a a binary file format with a bunch of headers and floating point data. I am working on a code that parses the binary file. Reading the headers was not hard but when I tried to read the data I ran into some difficulties.

I opened the file and read the headers as the following:

ifs.open(fileName, std::ifstream::in | std::ifstream::binary);
char textHeader[3200];
BinaryHeader binaryHeader;
ifs.read(textHeader,sizeof(textHeader));
ifs.read(reinterpret_cast<char *>(&binaryHeader), sizeof(binaryHeader));

The documentation says the data is stored as: 4-byte IBM floating-point and I tried something similar:

vector<float> readData(int sampleSize){
    float tmp;
    std::vector<float> tmpVector;
    for (int i = 0; i<sampleSize; i++){
        ifs.read(reinterpret_cast<char *>(&tmp), sizeof(tmp));
        std::cout << tmp << std::endl;
        tmpVector.push_back(tmp);
    }
    return tmpVector;
}

Sadly the result does not seem correct. What do I do wrong?

EDIT: Forgot to mention, the binary data is in big-endian, but if I print the tmp values out the data does not seem correct either way.

Conclusion: The 4-byte IBM floating-point is not the same as the float.

1
It's hard to tell exactly, but at first glance there might be two issues that I can see. First you have an array of char for your header, this might be okay, but have you tried using an unsigned char instead? The second possible issue is that in your readData() function you create a temporary vector on that functions stack frame and you then return it. Maybe try changing the signature of this function to accept an std::vector<float> by reference and pass it into the function instead of return a copy to a temporary.Francis Cugler
Is the binary data big or little endian?Retired Ninja
“4-byte IBM floating-point” is probably not the same as float. If that’s the case You’ll have to do some work to translate the input into something your hardware can work with.Pete Becker
Google “4-byte IBM floating-point”. There’s lots of information out there. And, as I guessed earlier, it’s not the same layout as an IEEE float.Pete Becker

1 Answers

0
votes

There are a few things to consider:

  • The first one, I'm not 100% sure if this would make a difference or not, but you are using an array of chars for your header char textHeader[3200];. Maybe you could try changing this to an array of unsigned char instead...

  • The second one in which I think may be a bigger issue which has to do more with performance is within your readData function itself. You are creating a local temporary std::vector of floats on that functions stack frame. Then you are returning it. The return isn't even by reference or pointer so this will also create unnecessary copies, however by the time the next piece of code tries to use this vector, the temporary has already been destroyed since the function has already gone out of scope. For this issue I would probably suggest changing the declaration and definition of this function.

    I would change it from what you currently have:

    vector<float> readData(int sampleSize)

    to this:

    void readData( int sampleSizes, std::vector<float>& data )

  • The third which is probably the most important of the three was mentioned in a form of a question in your comments by user RetiredNinja as I was originally writing this, had asked you a very good question about the endian of the data type being stored. This can also be a major factor. The actual data representation that is physically stored in memory I think is the biggest concern here.

According to the fact that your documentation has stated that it is stored as a 4-byte IBM floating-point type and that it is in big endian; I have found this specification by IBM that may be of help to you.