1
votes

I was looking at constructing optimal Huffman codes over non-binary alphabets.

This question was asked in Huffman trees for non-binary alphabets?. The solution suggested to use the Huffman coding procedure combining n lowest frequency symbols at a time (as is also suggested by wikipedia). However, This does not seem to be optimal. Say I have 4 alphabets with frequencies,

 A --> 0.4

 B --> 0.25

 C --> 0.2

 D --> 0.15

The ternary Huffman code derived here using this would be

A --> 0

B --> 10

C --> 11

D --> 12

However the following code would have shorter expected length:

A --> 0

B --> 1

C --> 20

D --> 21

Am I missing something here?

PS I am posting this as a question because I can not comment on the previous post.

1
Can you please explain the steps on how you get your example solution (A->0, B->10 etc.) as when I perform the Huffman algorithm with n=3 I get the results similar to what you're expecting, not the ones you obtain. - Happington
I combine the three least probable characters B,C,D to get a supercharacter BCD of probability .6. Next, I have to find a Huffman code for A, and BCD with probabilities .4 and .6. I assign 0 to A and 1 to BCD. Then I unravel BCD to get 10 for B, 11 for C and 12 for D. - Devil
Ohhh I see where you're going wrong, I'll explain it in an answer, hold up. - Happington
Sorry for the incorrect information, I have updated my answer with a more correct version and a proper reference. - Happington

1 Answers

2
votes

The wikipedia article pointed to says "Note that for n greater than 2, not all sets of source words can properly form an n-ary tree for Huffman coding. In this case, additional 0-probability place holders must be added." I think for a 3-ary tree the next full tree after 3 leaves has 5 leaves, so I think you should add a 0-probability character before running the 3-ary huffman code algorithm and this gives you {0, C, D} as the first stage, which produces the encoding you prefer.