Well, you're trying to parse a tree of bytes from a bit-stream. Parsing's one of those cases where it pays to set up some structure: we're going to write a miniature parser combinator library in the style of How to Replace Failure by a List of Successes, which will allow us to write our code in an idiomatic functional style and delegate a lot of the work to the machine.
Translating the old rhyme into the language of monad transformers, and reading "string" as "bit-string", we have
newtype Parser a = Parser (StateT [Bool] [] a)
deriving (Functor, Applicative, Monad, Alternative)
runParser :: Parser a -> [Bool] -> [(a, [Bool])]
runParser (Parser m) = runStateT m
A parser is a monadic computation which operates statefully on a stream of Booleans, yielding a collection of successfully-parsed a
s. GHC's GeneralizedNewtypeDeriving
superpowers allow me to elide the boilerplate instances of Monad
et al.
The goal, then, is to write a Parser (Tree SevenBits)
- a parser which returns a tree of septuples of Booleans. (You can turn the 7 bits into a Word8
at your leisure by deriving a Functor
instance for Tree
and using fmap
.) I'm going to use the following definition of Tree
because it's simpler - I'm sure you can figure out how to adapt this code to your own ends.
data Tree a = Leaf a | Node (Tree a) (Tree a) deriving Show
type SevenBits = (Bool, Bool, Bool, Bool, Bool, Bool, Bool)
Here's a parser that attempts to consume a single bit from the input stream, failing if it's empty:
one :: Parser Bool
one = Parser $ do
stream <- get
case stream of
[] -> empty
(x:xs) -> put xs *> return x
Here's one which attempts to consume a particular bit from the input stream, failing if it doesn't match:
bit :: Bool -> Parser ()
bit b = do
i <- one
guard (i == b)
Here I'm pulling a sequence of seven Booleans from the input stream using replicateM
and packing them into a tuple. We'll be using this to populate Leaf
nodes' contents.
sevenBits :: Parser SevenBits
sevenBits = pack7 <$> replicateM 7 one
where pack7 [a,b,c,d,e,f,g] = (a, b, c, d, e, f, g)
Now we can finally write the code which parses the tree structure itself. We'll be choosing between the Node
and Leaf
alternatives using <|>
.
tree :: Parser (Tree SevenBits)
tree = node <|> leaf
where node = bit False *> liftA2 Node tree tree
leaf = bit True *> fmap Leaf sevenBits
If node
succeeds in parsing a low bit from the head of the stream, it continues to recursively parse the encoding of the left subtree followed by the right subtree, sequencing the applicative actions with liftA2
. The trick is that node
fails if it doesn't encounter a low bit at the head of the input stream, which tells <|>
to give up on node
and try leaf
instead.
Note how the structure of tree
reflects the structure of the Tree
type itself. This is applicative parsing at work. We could alternately have structured this parser monadically, first using one
to parse an arbitrary bit and then using a case
analysis on the bit to determine whether we should continue to parse a pair of trees or a leaf. In my opinion this version is simpler, more declarative, and less verbose.
Also compare the clarity of this code to the low-level style of @behzad.nouri's foldr
-based solution. Rather than building an explicit finite-state machine which switches between parsing nodes and leaves - an imperative-flavoured idea - my design allows you to declaratively describe the grammar to the machine using standard functions like liftA2
and <|>
and trust that the abstractions will do the right thing.
Anyway, here I'm parsing a simple tree consisting of a pair of Leaf
s containing the (binary-encoded) numbers 0
and 1
. As you can see, it returns the single successful parse and an empty stream of remaining bits.
ghci> runParser tree $ map (>0) [0, 1, 0,0,0,0,0,0,0, 1, 0,0,0,0,0,0,1]
[(Node (Leaf (False, False, False, False, False, False, False)) (Leaf (False, False, False, False, False, False, True)),[])]