3
votes

need some help to understand how DEFLATE Encoding works. I know that is a combination of the LZSS algorithm and Huffman coding.

So let encode for example "Deflate late". Params: [Search buffer: 8kb and Look-ahead buffer 4kb] Well, the output of LZSS algorithm is "Deflate <5, 4>" The next step uses static huffman coding to reduce the redundancy. Here is my problem, I dont know how should i encode this pair <5, 4> with huffman.


[Edited]

D 000
f 001
l 010
a 011
t 100
_ 101
e 11

So well, according to this table the string "Deflate " is written as 000 11 001 010 011 100 11 101. As a next step lets encode the pair (5, 4). The fixed prefix code of the length 4 according to the book "Data Compression - The Complete Reference" is 258, followed by fixed prefix code of the distance 5 (Code 4 + 1 Extra bit).

That can be summarized as:

length 4 -> 258 -> 0000010
distance 5 -> 4 + 1 extra bit -> 00100|0

So, the encoded string is written as [header: 1 01] 000 11 001 010 011 100 11 101 0000010 001000 [end-of-block: 0000000], BUT if i create a huffman tree, it is not a static huffman anymore, right?

Good day

1
Since you didn't ask how to encode the "Deflate", then you must already know how to emit the Huffman codes for those literals. You do exactly the same thing where you emit a length of 4 instead of a literal, followed by a distance code of 5.Mark Adler
So well, according to this table the string "Deflate " is written as 000 11 001 010 011 100 11 101. As a next step lets encode the pair (5, 4). The fixed prefix code of the length 4 according to the book "Data Compression - The Complete Reference" is 258, followed by fixed prefix code of the distance 5. [summarized as]: length 4 -> 258 -> 0000010 [7 Bits] distance 5 -> 4 + 1 extra bit -> 00100|0 So, the encoded string is written as [header: 1 01] 000 11 001 010 011 100 11 101 0000010 001000 [end-of-block: 0000000], BUT if i create a huffman tree, it is not a static huffman, right?FewG

1 Answers

13
votes
D 000
f 001
l 010
a 011
t 100
_ 101
e 11

is not the Deflate static code. The static literal/length codes are all 7, 8, or 9 bits, and the distance codes are all 5 bits. You asked about the static codes.

'Deflate late' encoded in static deflate format as the literals 'Deflate ' and a length 4, distance 5 match in hex is:

73 49 4d cb 49 2c 49 55 00 11 00

That is broken down as follows (bits are read from the least significant part of each byte first):

011 - 01 means fixed code, 1 means last block
00101110 - D
10101001 - e
01101001 - f
00111001 - l
10001001 - a
00100101 - t
10101001 - e
00001010 - space
0100000 - length 4
00100 - distance 5 or 6 depending on one extra bit
0 - extra bit -> distance 5
0000000 - end code
0 - fill bit to byte boundary