2
votes

I am currently reading about the DEFLATE method for encoding/decoding data. I understand that the process is composed of two parts:

i. Replace duplicate information (within a specified window) with a reference back to the previous identical piece.

ii. Use Huffman coding to reduce the size of the most commonly occurring symbols.

I have a question with regards to (i). DEFLATE uses LZ77 which, based on a size window, searches through the information and, if it finds any duplicate information, replaces it with a "pointer". That makes perfect sense.

However, when decoding using LZ77 how does DEFLATE recognize a pointer? (Pointers are length-distance pairs; how can you discern if it's a pointer or just a number that was present in the initial data?)

Reference: http://en.wikipedia.org/wiki/DEFLATE#Duplicate_string_elimination

1
Some interesting info here: zlib.net/feldspar.htmlJoe

1 Answers

4
votes

It's recommended to read the Deflate RFC 1951 specification, which is much more precise, and answer such questions.

What you'll see in => 3.2.5. Compressed blocks (length and distance codes)

"the literal and length alphabets are merged into a single alphabet"

which means that, by simply retrieving the next symbol, you immediately know if it is a literal (0..255), or a match length (257..285), or even an end of block (256). In case of a match length, a reference (offset) must be decoded too. Offset are encoded using a separate tree.