Recreating ZLIB from scratch

Question

I am trying to understand how do the ZLIB algorithm that was implemented in python works, I understand that it uses a variant of DEFLATE. I am wondering if it is possible to do it by hand from the given Data to the Compress Data.

Data :                      00   Compress Data :                63 00 00     bin:   01100011 00000000 00000000
Data :                      01   Compress Data :                63 04 00     bin:   01100011 00000100 00000000
Data :                      02   Compress Data :                63 02 00     bin:   01100011 00000010 00000000
Data :                      03   Compress Data :                63 06 00     bin:   01100011 00000110 00000000
Data :                      04   Compress Data :                63 01 00     bin:   01100011 00000001 00000000

The above data is compressed with ZLIB level 1 compression with the header 78 01 and their ADLER32 stripped. So what is left is the compress data from DEFLATE (if I am not wrong)

Numbering the bits as such,

+--------+
|76543210|
+--------+

and the bytes as follow,

    0        1
+--------+--------+
|00001000|00000010|
+--------+--------+

From DEFLATE standard here.

I understand that bit 0 from the first byte indicates the last block of the file and bit 1 and 2 indicates that DEFLATE is using mode 10 compression. But I am unable to recover/understand what do the rest of the bits means or how to compute them by hand.

Below is an extended version of the bytes,

Data :      01 01 01 01 02 02    Compress Data :    63 64 64 64 64 62 02 00      bin:   01100011 01100100 01100100 01100100 01100100 01100010 00000010 00000000

Mark Adler Mark Adler · Accepted Answer · 2020-06-23T17:34:57

zlib does not use a "variant" of DEFLATE. It uses DEFLATE.

The DEFLATE compressed data format is fully described in RFC 1951.

You can use infgen to disassemble DEFLATE streams. Example output for your first one:

! infgen 2.4 output
!
last
fixed
literal 0
end

It says that the first block is also the last block, that it is a fixed-code block, and that there is a literal zero byte as the data contained in the block.

You can also look at the source code for infgen to assist in your understanding of RFC 1951.

Recreating ZLIB from scratch

1 Answers