Huffman Tree Issue

Question

We just started on Huffman tree's in class and I'm having a few issues. First given the data and frequencies...

 Data         %    /    -    +     *
 Frequencies  5    10   25   30    50

Create a custom Huffman tree.

I created...

And then replaced the frequencies with the corresponding data, but my roommate received a different answer? Am I the one in the wrong here?

Also, I can't seem to wrap my head around this question,

What would the Huffman code look like if all symbols in the alphabet had
equal frequency?

Any and all help is much appreciated!

P.S. These are just study guide questions, not homework related.

EDIT: How I arrived at my answer:

Took 5 and 10 at the bottom of the tree, added those together to get a "ghost" node 15. Added 25 to the right of that because it is bigger, then created a ghost node 40 by adding those together. put 30 to the left of 40 because it is smaller, and then created a ghost node 70 by adding the two. Finally added 50 to the left of 70 because it was smaller and then created the final ghost node 120 by adding the two.

Did you write a program to calculate the tree? Or, if not, can you describe your algorithm? — Kenney
for the second part: remember that the huffman tree is there to figure out vari-length representations of each character in the raw data. more frequent chars end up with shorter representations. if all chars have the same frequency, then there'd be very little point in doing huffman encoding on the data. you'd be better off just defining a new alphabet with fixed (if shorter) bit sequences for each char. — Marc B
@Kenney i have updated my question with how I came to my answer. I understand that part, and I think it's a rather odd question, but what would the code look like if it was the case of all letters having the same frequency — Bob
A program is nothing else than an algorithm written so a computer can execute it. So, it's a description of what to do when presented with a frequency table, which you gave in your edit - thanks! — Kenney

Kenney Kenney · Accepted Answer · 2015-11-20T20:34:08

This is what my Huffman.pm comes up with:

0  -- *
10  -- +
110  -- -
1110  -- /
1111  -- %

The above are the shortest symbols encoding the input string. It is a tree, with an implicit root:

   Root
  /    \
0(*)    1
       /  \
     0(+)  1
          /  \
        0(-)  1
             /  \
           0(/) 1(%)

There are different ways to encode it, but it must be consistent. As per wikipedia:

the bit string representing some particular symbol is never a prefix of the bit string representing any other symbol

Your tree

                        120
                       /   \
                    50(*)    70            
                           /  \
                        30(+)  40          
                               / \
                              15  25(-)
                             /  \
                          5(%)  10(/)

satisfies this, although with a different encoding:

The reason our trees are different is because mine is a Canonical Huffman Code, which means that I only have to list the symbols and length of their encoding (or path) for anyone to be able to re-construct the Huffman codes/tree:

*: 1
+: 2
-: 3
/: 4
%: 4

This is because a 0 always terminates a code, and 1 always means that at least 1 more node will follow, except for the outermost leaf (%, 1111), which is possible because we know the maximum code length (the maximum depth of the tree).

Huffman Tree Issue

2 Answers