0
votes

I'm rewriting the Huffman coding algorithm as a Haskell rookie exercise and I'm struggling a bit reforming the tree I serialized using the following technique:
- Walk the tree from root in depth-first preorder
- Encounter a Node put 0 followed by recurse on left then right childs
- Encounter a Leaf put 1 followed by the symbol byte

My serialization code:

serializeHtree :: (Ord a) => CodeDict a -> HTree a -> [Bit]
serializeHtree dict (Leaf a) = I : (myLpad $ dict M.! a)
serializeHtree dict (Branch l r) = O : (serializeHtree dict r ++ serializeHtree dict l)

Where:
- CodeDict is a Map from a to [Bit]
- myLpad fill the variable-length huffman code with 0 to make it fixed-length
- and Bit is my own datatype constructed by O and I

For example,
Huffman tree

The tree above will be represented as:
0100000000001000001001000001010100000110100000111

Now to deserialize it, I know I have to read the stream bit per bit and:
- Encounter a 1 make a Leaf with the next byte
- Encounter a 0 make a branch recursing on left then right subtrees

But I'm failing in my approach, here is my deserialization code (non functional this time):

deserializeHtree :: [Bit] -> HTree a
deserializeHtree (x:xs) = case x of
    'O' -> Branch (deserializeHtree xs) (deserializeHtree xs)
    'I' -> Leaf (head xs)  

Thanks for support

1
Your algorithm description doesn't match annotations in the picture, which doesn't match the bit representation under it. Something is amiss.Fyodor Soikin
Sorry I took the picture from the net. I should have erased the 0 and 1 on the branches (they are used for encoding phase). I admit it's confusingBen
This seems quite similar to this question but with bits instead of characters.HTNW

1 Answers

2
votes

You need to write a parser, using a type suitable for recursion.

At the top level you indeed want [Bit] -> HTree a (probably restricting a to some type class, but I'll neglect this). However, for enabling recursion you need

parser :: [Bit] -> (HTree a, [Bit])
-- or, if you need to handle failure
parser :: [Bit] -> Maybe (HTree a, [Bit])

The idea is that parser is fed with the bits to be parsed. It then tries to parse the first tree which is represented in the prefix of those bits. If it succeeds, it returns an HTree a and a list of the exceeding (non consumed) bits.

Returning the non consumed bits is crucial for Branch where you need to parse two subtrees. You parse the first one, take the non consumed bits, and then use those to start the parser of the right subtree.

This topic is too broad for a SO answer, but if you google for "Haskell parser monad" you should find plenty of examples.