I have troubles with understanding Deflate algorithm (RFC 1951).
TL; DR How to parse Deflate compressed block 4be4 0200
?
I created a file with a letter and newline a\n
in it, and run gzip a.txt
. Resultant file a.txt.gz
:
1f8b 0808 fe8b eb55 0003 612e 7478 7400
4be4 0200
07a1 eadd 0200 0000
I understand that first line is header with additional information, and last line is CRC32 plus size of input (RFC 1951). These two gives no trouble to me.
But how do I interpret the compressed block itself (the middle line)?
Here's hexadecimal and binary representation of it:
4be4 0200
0100 1011
1110 0100
0000 0010
0000 0000
As far as I understood, somehow these ones:
Each block of compressed data begins with 3 header bits containing the following data:
- first bit BFINAL
- next 2 bits BTYPE
...actually ended up at the end of first byte: 0100 1011. (I'll skip the question why would anyone call "header" something which is actually at the tail of something else.)
RFC contains something that as far as I understand is supposed to be an explanation to this:
- Data elements are packed into bytes in order of increasing bit number within the byte, i.e., starting with the least-significant bit of the byte.
- Data elements other than Huffman codes are packed starting with the least-significant bit of the data element.
- Huffman codes are packed starting with the most- significant bit of the code.
In other words, if one were to print out the compressed data as a sequence of bytes, starting with the first byte at the right margin and proceeding to the left, with the most- significant bit of each byte on the left as usual, one would be able to parse the result from right to left, with fixed-width elements in the correct MSB-to-LSB order and Huffman codes in bit-reversed order (i.e., with the first bit of the code in the relative LSB position).
But sadly I don't understand that explanation.
Returning to my data. OK, so BFINAL is set, and BTYPE is what? 10 or 01?
How do I interpret the rest of the data in that compressed block?