I'm rewriting the Huffman coding algorithm as a Haskell rookie exercise and I'm struggling a bit reforming the tree I serialized using the following technique:
- Walk the tree from root in depth-first preorder
- Encounter a Node put 0 followed by recurse on left then right childs
- Encounter a Leaf put 1 followed by the symbol byte
My serialization code:
serializeHtree :: (Ord a) => CodeDict a -> HTree a -> [Bit]
serializeHtree dict (Leaf a) = I : (myLpad $ dict M.! a)
serializeHtree dict (Branch l r) = O : (serializeHtree dict r ++ serializeHtree dict l)
Where:
- CodeDict is a Map from a to [Bit]
- myLpad fill the variable-length huffman code with 0 to make it fixed-length
- and Bit is my own datatype constructed by O and I
The tree above will be represented as:
0100000000001000001001000001010100000110100000111
Now to deserialize it, I know I have to read the stream bit per bit and:
- Encounter a 1 make a Leaf with the next byte
- Encounter a 0 make a branch recursing on left then right subtrees
But I'm failing in my approach, here is my deserialization code (non functional this time):
deserializeHtree :: [Bit] -> HTree a
deserializeHtree (x:xs) = case x of
'O' -> Branch (deserializeHtree xs) (deserializeHtree xs)
'I' -> Leaf (head xs)
Thanks for support