0
votes

Start with a text file that looks like this:

a: 0
b: 100
c: 101
d: 11

0 0 100 100 11 101

So this would decode: aabbdc

What decoding algorithm could I use that builds a Huffman tree and then uses it to decode the message? Sample code would be highly appreciated as well!

Here is what I was thinking:

  • create a lookup table that maps each symbol to its bits
  • create a root node
  • to build the tree, read in each bit from the encoded
    • if 0, create a left child. if 1, create a right child
    • if space is reached, indicate a leaf somehow (null left and right pointers)
      • use the bits we read up until that space and see what this is the lookup table
      • insert the character at that leaf

then, I could just read in each bit again and have it move through the tree. when it hits a space, I would just return the character at the leaf it reached?

1
write the code and see what happensPepe
Having the codes separated by spaces removes all the complication from the decoding, and indeed removes the need for the codes to be prefix-free. Also, if you're serious about Huffman coding, look up table-based decoding.harold

1 Answers

2
votes

There aren't / shouldn't be any spaces in the input. You should just get something like 0010010011101.

To create the tree, for each character, start at the root, and, for each bit, go left if it's a 0 or go right if it's a 1 (creating nodes where required). When you've reached the end of some character, set the value of the node we're at to the character.

Then, go through the input, starting from the root in the tree - do the same as above, but, rather than creating nodes, just stop when you reach a leaf, output the value at that node and go back to the root.

Example:

a = 0 - just create a left child for the root.

  .
 /
a

b = 100 - go right (for the 1), then left (for the 0), then left again (for the 0).

  .
 / \
a   .
   /
  .
 /
b

c = 101 - go right, then left, then right.

  .
 / \
a   .
   /
  .
 / \
b   c

d = 11 - go right, then right.

  .
 / \
a   .
   / \
  .   d
 / \
b   c

When processing the input 00100...

Start at the root.

We get a 0, so go left.
We get to a leaf that's value is a, so output it and go back to the root.

We get a 0, so go left.
We get to a leaf that's value is a, so output it and go back to the root.

We get a 1, so go right.
We get a 0, so go left.
We get a 0, so go left.
We get to a leaf that's value is b, so output it and go back to the root.