0
votes

I am trying to understand how do the ZLIB algorithm that was implemented in python works, I understand that it uses a variant of DEFLATE. I am wondering if it is possible to do it by hand from the given Data to the Compress Data.

Data :                      00   Compress Data :                63 00 00     bin:   01100011 00000000 00000000
Data :                      01   Compress Data :                63 04 00     bin:   01100011 00000100 00000000
Data :                      02   Compress Data :                63 02 00     bin:   01100011 00000010 00000000
Data :                      03   Compress Data :                63 06 00     bin:   01100011 00000110 00000000
Data :                      04   Compress Data :                63 01 00     bin:   01100011 00000001 00000000

The above data is compressed with ZLIB level 1 compression with the header 78 01 and their ADLER32 stripped. So what is left is the compress data from DEFLATE (if I am not wrong)

Numbering the bits as such,

+--------+
|76543210|
+--------+

and the bytes as follow,

    0        1
+--------+--------+
|00001000|00000010|
+--------+--------+

From DEFLATE standard here.

I understand that bit 0 from the first byte indicates the last block of the file and bit 1 and 2 indicates that DEFLATE is using mode 10 compression. But I am unable to recover/understand what do the rest of the bits means or how to compute them by hand.

Below is an extended version of the bytes,

Data :      01 01 01 01 02 02    Compress Data :    63 64 64 64 64 62 02 00      bin:   01100011 01100100 01100100 01100100 01100100 01100010 00000010 00000000
1

1 Answers

0
votes

zlib does not use a "variant" of DEFLATE. It uses DEFLATE.

The DEFLATE compressed data format is fully described in RFC 1951.

You can use infgen to disassemble DEFLATE streams. Example output for your first one:

! infgen 2.4 output
!
last
fixed
literal 0
end

It says that the first block is also the last block, that it is a fixed-code block, and that there is a literal zero byte as the data contained in the block.

You can also look at the source code for infgen to assist in your understanding of RFC 1951.