my hope for this question (see: bottom) is to lay out as much as I know about the deflate process, and I can receive corrections on areas where I am (perhaps very) misinformed. Hopefully, at the end of it, this question could be a handy resource.
Zlib header
The first two bytes equate to a header for the zlib compression used with the format of (credit)
---CMF--- ---FLG---
0111.1000 1101.0101
CINF -CM- +-||
| |+- FCHECK
| +-- FDICT
+---- FLEVEL
From RFC 1950, right to left:
FCHECK (1.0101) - validates that CMF & FLG as a 16 bit unsigned int is a multiple of 31
FDICT (0) - if set, indicates preset DICT following immediately after FLG
FLEVEL (11) - compression "intensity" [0-3]
CM (1000) - for compression method, where CM = 8 == "deflate" compression method
CINF (0111) - indicates the size of the sliding window used, where CINF = 7 == 32K sliding window
Data block header
The next three bits in the NEW BYTE equate to a header for the Huffman encoded block:
---CMF--- ---FLG--- NEW BYTE
0111.1000 1101.0101 11101100
|-|
| +- BFINAL
+--- BTYPE
From RFC 1951 right to left:
BFINAL (0) - is set (1) if this is the last block of data
BTYPE (10) - Huffman encoding : (00)none; (01)Fixed Huffman codes; (10) dynamic codes; (11) invalid
The Huffman Codes
From here I will work off the assumption of BTYPE = (10)
The following values immediately proceed:
NEW BYTE NXT BYTE
(11101)100 -> 101)(11101) -> 0111111(1
|-|
| +- BFINAL
+--- BTYPE
HLIT
(11101) - 5-bit number of length/literal codes, 257 added (257-286)HDIST
(11101) - 5-bit number of distance codes, 1 added (1-32)HCLEN
(1111) - 4-bit number of code-length codes, 4 added (4-19)
Immediately following this is HCLEN
(don't forget +4) 3-bit fields, where the values are assigned to this sequence, in order:
16 17 18 0 8 7 9 6 10 5 11 4 12 3 13 2 14 1 15
Since HCLEN = 19, the whole sequence is used
A code length of 0
in that sequence means the corresponding symbol is not used.
As a graphical example, after reading the 19x3 bits we have six extra bits(extra bits in brackets):
NXT BYTE 00000000 00000000 00000000 00000000 00000000 00000000 [000000](00
My Question
Do the last bits in brackets above, get thrown away?
0x78
is identical to my example above (CMF byte containing CINF and CM) – Trés DuBiel