0
votes

I tried to implement the algorithm according to the example:

enter image description here

"Shannon Fano"

According to this decision, I have to get A = 11, B = 101, C = 100, D = 00, E = 011, F = 010. But i get A = 11 B = 101 C = 100 D = 01 E = 001 F = 000 This my code:

Input parametrs: frequencies = [50, 39, 18, 49, 35, 24] и chars =  [A, B, C, D, E, F]
OnClick: SearchTree(' ',' ', 1, charCount,Memo1);


procedure SearchTree(branch:char; full_branch:string; start_pos:integer; end_pos:integer; memo:TMemo);
var
  dS:real; 
  i, m, S:integer; 
  c_branch:string; 
  x,y,j:integer;
begin
  if (branch<>' ') then c_branch := full_branch + branch
  else c_branch := '';

  if (start_pos = end_pos) then
  begin
    memo.Lines.Add(chars[start_pos]+ ' = ' + c_branch);
    exit;
  end;

   x:=0;  y:=0;
   i:=start_pos-1;  j:=end_pos;
   repeat
   Inc(i);
   x:=x+frequencies[i];
   while ((x>=y) and (i<>j))do
     begin
      y:=y+frequencies[j];
      Dec(j);
     end;
     m:=i;
   until (i=j);
  SearchTree('1', c_branch, start_pos, m,memo);
  SearchTree('0', c_branch, m+1, end_pos,memo);
end;

Prompt if I understood the algorithm and what is my problem?

1
There is a sample Delphi implementation and worked example of Shannon-Fano encoding on pp416-420 of Julian Bucknall's Algorithms & Data Structures book, ISBN 1-55622-736-1. - MartynA

1 Answers

3
votes

I will not try to decipher your code, but for what it's worth, what you are doing here is not the way I have understood Shannon-Fano encoding.

I must admit right away that I have never personally coded one (opting for Huffman encoding instead, which offers as good as or better compression for all input data at the sacrifice of some complexity).

Here's how I believe Shannon-Fano codes should be constructed from your sample data:

Given character frequencies:

A   B   C   D   E   F
50, 39, 18, 49, 35, 24 = 215 (ideal distribution = 215 / 2 = 107.5 to each start bit)

Now sort the frequencies:

A   D   B   E   F   C
50, 49, 39, 35, 24, 18 = 215 (ideal distribution = 215 / 2 = 107.5 to each start bit)

Now find the split point in this list that gives the least amount of "error" (waste):

50 | 49 39 35 24 18 -- error (distance to ideal 107.5) = 57.5 
50 49 | 39 35 24 18  -- error (distance to ideal 107.5) = 8.5
50 49 39 | 35 24 18 -- error (distance to ideal 107.5) = 30.5

So the best split point at the first level is between 49(D) and 39(B), which in turn means that we have AD on the left-hand branch and BEFC on the right-hand branch.

Since there happens to be only two characters left on the left-hand branch we have the encoding for those directly:

Assuming left is 1 and right is zero, A becomes 11 and D becomes 10.

All the remaining character encodings (BEFC) then begin with zero.

You now repeat this process recursively with the remaining list in the same way until there are at most two entries in the list, and you are done.