4
votes

I've never dealt with a binary file with multiple data types in python. I was hoping I could get some direction. The binary file contains the following data types:

String
Byte
UInt8 -Size in bytes: 1- 8-bit unsigned integer.
UInt16 -Size in bytes: 2- Little-endian encoded 16 bit unsigned integer.
UInt32 -Size in bytes: 4- Little-endian encoded 32 bit unsigned integer.
UInt64 -Size in bytes: 8- Little-endian encoded 64 bit unsigned integer.

What I've been unsuccesful in doing is decoding my data properly. The data contains a common message format that serves as wrapper to deliver one or more higher level messages. I've provided below the field names contained in this wrapper.

Within this message I can have:
Length- Offset 0 - Size 2 - Type UInt16
Message Count - Offset 2 - Size 1- Type UInt8
ID - offset 3 - Size 1 - Type Byte
Sequence - offset 4 - Size 4 - Type UInt32
Payload- offset 8

Where the length specifies the length of the common message, the message count tells of how many higher level message will begin in the Payload.

The higher level message begins in Payload with the following characteristics

Message Length - 0 - Size 1 - Type UInt8
Message Type - offset 1 - Size 1 - type Byte

Once I'm able to figure out what the Message Types are in each higher level message the rest is trivial. I've been trying to create a class BinaryReader to do this for me and I haven't been able to be succesful use struct.unpack.

EDIT: This is an example of the common message
('7x\xecM\x00\x00\x00\x00\x15.\x90\xf1\xc64CIDM')
and the higher level message inside it
('C\x01dC\x02H\x00\x15.\xe8\xf3\xc64CIEN')

2
Specifically what problems do you have with struct.unpack(). Start with the simplest failing case you can come up with, showing exactly what you did, what you wanted, and what you got.Tim Peters
Well I've added the two examples I'm working with if that helps.FancyDolphin
It's a start. Now what did you try, what did you want from it, and what did you get? For example, struct.unpack("<H", '7x\xecM\x00\x00\x00\x00\x15.\x90\xf1\xc64CIDM'[:2]) returns 30775. Is that, or is that not, what you want for the "length" field? It is the proper result for interpreting '7x' as a little-endian 16-bit unsigned int.Tim Peters
Sorry, I don't understand you. What - exactly - did you expect to return 2533? Show executable code, like I showed you. There is no way '7x' can be interpreted as 2533 as any kind of integer (regardless of endian-ness or byte size or signed vs unsigned). For your second example, again show code. Don't try to describe what you did - it's not working for you ;-) As to your last example, why do you imagine the string is UTF-8 encoded to begin with? As to what happened when you "try struct.unpack", show code. Can't guess what "try" means.Tim Peters
OK, try reading the struct docs first - passing "=" as a code means "unpack no data". It doesn't make sense. Since you told it your string has no data in it, it complains because you passed a non-empty string. struct.unpack('=', '') would return an empty tuple, with is correct. In binary data, \n is just another character. Binary data has no "line ends". It's a stream of bytes. To unpack a byte, use code b for a signed byte or B or unsigned. struct.unpack('B', '\x01') returns (1,) (a tuple containing the integer 1), which is correct. Try simple things first?Tim Peters

2 Answers

3
votes

Construct is a great library for parsing binary data.


You might use it something like this:

from construct import *

message = Struct("wrapper",
    UBInt16("length"),
    UBInt8("count"),
    Byte("id"),
    UBInt32("sequence"),
    Array(lambda ctx: ctx.length,
        Struct("message",
            UBInt8("length"),
            UBInt8("type"),
            Bytes("content", lambda ctx: ctx.length),
        ),
    ),
)
1
votes

I think you could use bitsrting module for Python http://code.google.com/p/python-bitstring/
It provides you with several nice feature including format strings for binary data.

Here you can find more about reading data and format strings.
http://pythonhosted.org/bitstring/reading.html#reading-using-format-strings
http://pythonhosted.org/bitstring/constbitstream.html#bitstring.ConstBitStream.read
http://pythonhosted.org/bitstring/constbitstream.html#bitstring.ConstBitStream.readlist

This code may give you an idea of a solution using bitstring.

from bitstring import BitStream
bs = BitStream(your_binary_data)

length, message_count, id, sequence = bs.readlist('uintle:16, uintle:8, bytes:1, uintle:32')
payload = bs[:bs.pos]
message_length, message_type = payload.readlist('uintle:8, bytes:1')
rest_of_data = payload[:payload.pos]