0
votes

I have been tasked with parsing out (via C#) an image from legacy binary files with a format that's around 20 years old now; the image data is embedded in the binary file and is prefixed by a hex flag. Below is the definition of the flags I am looking for (in C):

#define C_THUMBNAIL    0x0008        /* thumbnail bitmap */
#define C_CTHUMBNAIL   0x000d        /* compressed thumbnail bitmap */

How do I find one of these flags (are they even flags?) in the file? If I can figure out where the flag is and how to read the value coming after the flag (the size of the image in bytes), I can do what I need. This is what I have so far:

var binReader = new BinaryReader(new FileStream(fileLocation, FileMode.Open));

//1. find flag
//2. get image size in bytes
//3. take the slice of the byte array containing the image
//4. write that slice of the array to a .png file.

My original idea was to walk through the binary stream until I found the flag, but I'm really confused at how, if my hex flag is the number 8 (0x0008 == 8, right?), I'm supposed to find it in the file and differentiate it from all the other 8's in the file.

Sorry if this is a duplicate question, but I don't know enough about this problem to know what to research to solve it. I've read the MSDN documentation on binary files and read some similar questions here, but can't tell if they answer my question.

2
You need to know what the structure of the file is in order to find the flag values. What index is the flag located at or what binary structure there is to your file. Otherwise you haven't got a clue which bits represent your flags. - Kevin

2 Answers

0
votes

Your questions are correct.

0x08 is 8.

You cannot simply look for the flag in the file.

You'll need to find a document detailing how the file format is defined.

For example, "tar" (the compressed file format) has very specific information about which flags go where in the header, and how to parse a tar file.

0
votes

If that is the only information you have how to find the embedded image data then this will be hard. Prefixed by a flag could mean many thing. I could mean a leading 0x8 (1000 binary) or 0xd (1100 binary) byte but it could also be part of a bit mask.

If you don't have any other information of what kind of header it might have as a prefix then you could try and find all 0x8 or 0xd bytes, assume that it denotes the possible start of the image data and extract the image and then check if it results in a sensible image. However there could be a lot of them. You might be able to narrow down the search if you know approximate positions of the data in the file.

This requires that you know the actual image format as well, apparently. If you don't then you have pretty much lost unless it's some form of plain RGB bitmap or so.