0
votes

Need some help writing a Haskell function that takes a string and creates a binary tree. Need some help from someone with a little better Haskell experience to fill in some holes for me and describe why because this is a learning experience for me.

I'm given a tree encoded in a single string for a project in Haskell (Example **B**DECA). The star denotes a node any other character denotes a Leaf. I'm trying to fill this data structure with the information read in from input.

data Trie = Leaf Char | Branch Trie Trie

I'm more of a math and imperative programming guy so I made the observation that I can define a subtree by parsing from left to right. A proper tree will have 1 more character than *. Mathematically I would think of a recursive structure.

If the first character is not a * the solution is the first character. Else the solution is a Branch where the subbranches are feed back into the function where the left Branch is the first set of characters where Characters out number *'s and the right Branch is everything else.

constructTrie :: String -> Trie
constructTrie x = if x !! 1 == '*' then
                      let leftSubtree = (first time in drop 1 x where characters out number *'s)
                          rightSubtree = (rest of the characters in the drop 1 x)
                      in Branch constructTrie(leftSubtree) constructTrie(rightSubtree)
                  else Leaf x !! 1

Mostly I need help defining the left and right subtree and if there is anything wrong with defining it this way.

1

1 Answers

4
votes

!! (which, by the way, is 0-indexed) is usually a no-go. It's a very "imperative" thing to do, and it's especially smelly with constant indices like here. That means you really want a pattern match. Also, splitting a list (type String = [Char]) at an index and operating on the two parts separately is a bad idea, because these lists are linked and immutable, so you'll end up copying the entire first part.

The way you want to do this is as follows:

  • If the string is empty, fail.
  • If it starts with a *, then do something that somehow parses the left subtree and gets the remainder of the list in one step, and then parse the right one out of that remainder, making a Branch.
  • If it's another character, make a Leaf.

There's no need to figure out where to split the string, actually split the string, and then parse the halves; just parse the list until you can't anymore and then whatever's left (or should I say right?) can be parsed again.

So: define a function constructTrie' :: String -> Maybe (Trie, String) that consumes the start of a String into a Trie, and then leaves the unparsed bit behind (and gives Nothing if it just fails to parse). This function will be recursive, which is why it gets that extra output value: it needs extra plumbing to move the remainder of the list around. constructTrie itself can then be defined as a wrapper around it:

-- Maybe Trie because it's perfectly possible that the String just won't parse
constructTrie :: String -> Maybe Trie
constructTrie s = do (t, "") <- constructTrie' s
                  -- patmat failure in a monad calls fail; fail @Maybe _ = Nothing
                     return t

-- can make this local to constructTrie in a where clause
-- or leave it exposed in case it's useful
constructTrie' :: String -> Maybe (Trie, String)
constructTrie' "" = Nothing -- can't parse something from nothing!
constructTrie' ('*':leaves) = do (ln, rs) <- constructTrie' leaves
                              -- Parse out left branch and leave the right part
                              -- untouched. Doesn't copy the left half
                                 (rn, rest) <- constructTrie' rs
                              -- Parse out the right branch. Since, when parsing
                              -- "**ABC", the recursion will end up with
                              -- constructTrie' "*ABC", we should allow the slop.
                                 return (Branch ln rn, rest)
constructTrie' (x:xs) = return (Leaf x, xs)

This is a very common pattern: defining a recursive function with extra plumbing to pass around some state and wrapping it in a nicer one. I guess it corresponds to how imperative loops usually mutate variables to keep their state.