0
votes

We just started on Huffman tree's in class and I'm having a few issues. First given the data and frequencies...

 Data         %    /    -    +     *
 Frequencies  5    10   25   30    50

Create a custom Huffman tree.

I created...

                            120
                           /   \
                          50   70
                               /  \
                              30   40
                                   / \
                                  15  25
                                 / \
                                 5  10

And then replaced the frequencies with the corresponding data, but my roommate received a different answer? Am I the one in the wrong here?

Also, I can't seem to wrap my head around this question,

What would the Huffman code look like if all symbols in the alphabet had
equal frequency?

Any and all help is much appreciated!

P.S. These are just study guide questions, not homework related.

EDIT: How I arrived at my answer:

Took 5 and 10 at the bottom of the tree, added those together to get a "ghost" node 15. Added 25 to the right of that because it is bigger, then created a ghost node 40 by adding those together. put 30 to the left of 40 because it is smaller, and then created a ghost node 70 by adding the two. Finally added 50 to the left of 70 because it was smaller and then created the final ghost node 120 by adding the two.

2
Did you write a program to calculate the tree? Or, if not, can you describe your algorithm? - Kenney
for the second part: remember that the huffman tree is there to figure out vari-length representations of each character in the raw data. more frequent chars end up with shorter representations. if all chars have the same frequency, then there'd be very little point in doing huffman encoding on the data. you'd be better off just defining a new alphabet with fixed (if shorter) bit sequences for each char. - Marc B
@Kenney i have updated my question with how I came to my answer. I understand that part, and I think it's a rather odd question, but what would the code look like if it was the case of all letters having the same frequency - Bob
A program is nothing else than an algorithm written so a computer can execute it. So, it's a description of what to do when presented with a frequency table, which you gave in your edit - thanks! - Kenney

2 Answers

2
votes

This is what my Huffman.pm comes up with:

0  -- *
10  -- +
110  -- -
1110  -- /
1111  -- %

The above are the shortest symbols encoding the input string. It is a tree, with an implicit root:

   Root
  /    \
0(*)    1
       /  \
     0(+)  1
          /  \
        0(-)  1
             /  \
           0(/) 1(%)

There are different ways to encode it, but it must be consistent. As per wikipedia:

the bit string representing some particular symbol is never a prefix of the bit string representing any other symbol

Your tree

                        120
                       /   \
                    50(*)    70            
                           /  \
                        30(+)  40          
                               / \
                              15  25(-)
                             /  \
                          5(%)  10(/)

satisfies this, although with a different encoding:

0     *
10    +
1100  %
1101  /
111   -

The reason our trees are different is because mine is a Canonical Huffman Code, which means that I only have to list the symbols and length of their encoding (or path) for anyone to be able to re-construct the Huffman codes/tree:

*: 1
+: 2
-: 3
/: 4
%: 4

This is because a 0 always terminates a code, and 1 always means that at least 1 more node will follow, except for the outermost leaf (%, 1111), which is possible because we know the maximum code length (the maximum depth of the tree).

1
votes

You applied the Huffman algorithm correctly, your tree is fine, and it is the only possible tree with those frequencies. There may be different ways to draw it, but all correct trees for those frequencies will have the exact same topology. What is on the left or on the right does not matter. All that matters is how many branches there are to each symbol.

There are 16 different ways to assign bits to the branches (for each branch 0 could go on the left or the right), so you can get that many different codes. However they are all optimal. If your roommate's code had the same number of bits for each symbol, then you're both right, regardless of how you drew your trees or how you assigned the 0's and 1's.