While Compressing a file Using Huffmann coding, After assigning Huffmann codes to each character in a file, these characters should be replaced with equivalent Huffmann codes in the compressed file. Then how the equivalent characters gets extracted with those Huffman codes from the compressed files while decompressing the file. Do the compressed file contains some extra information to decode the Huffmann codes?
0
votes
In the typical implementation of this, you store the tree of encoded symbols before the compressed data, this way the decompression first reads and reconstructs the tree of symbols, then uses that to decompress the data that follows. See if my answer to a different Huffman question helps you out, if not, please feel free to expand on what exactly it is you're asking about.
- Lasse V. Karlsen
Thanks a lot. After having looked at the above mentioned answer, I am left with one question how I can generate the huffman codes using the tree(001A1C01E01B1D)?
- Anish
You read those bits in and regenerate the tree. To decode the compressed stream you start at the root, then read the next bit, if it's a 0 go left, if it's a 1 go right, and when you reach a leaf node, output the symbol in the leaf node then reset back to the root and keep reading bits. Can you clarify exactly what your question is?
- Lasse V. Karlsen
My question was how I can regenerate the huffman codes assigned for characters(using the tree for example 001A1C01E01B1D). However this part of the above mentioned answer explained it all. To read, do this: Read bit. If 1, then read N-bit character/byte, return new node around it with no children If bit was 0, decode left and right child-nodes the same way, and return new node around them with those children, but no value . Thanks a lot again.
- Anish
The above stated method works for .txt files but I am not able to compress the PDF files. I need help.
- Anish
1 Answers
0
votes
Yes. You need to send a description of the Huffman code in order to decode them.
The usual implementation is to encode using a canonical Huffman code, and then sending just the lengths for each symbol. The description of the code can itself be compressed.