9
votes

I'm writing a very simple two-pass assembler in Haskell and I've come across a scenario that I don't yet have the experience to solve. I think the solution is likely to involve monad transformers, which I don't really understand.

The assembler parses the assembly code into a list of Statements, which are either instructions or labels. Some Statements may refer to labels. The assembler needs to convert the Statements into Instructions, which involves eliminating the labels and substituting the label references with an appropriate value.

I have written the first pass of the assembler, which produces a [(String, Int)] representing a map from labels to addresses. I have also written the following function for translating a Statement into an Instruction:

stmtToInstruction :: Int -> [(String, Int)] -> Statement -> Either String [I.Instruction]
stmtToInstruction addr labels stmt = case stmt of
    ADD d s1 s2     -> Right [I.ADD d s1 s2]
    BEQL s1 s2 l    -> case do label <- find (\e -> fst e == l) labels
                               let labelAddr = snd label
                               let relativeAddr = I.ImmS $ fromIntegral (labelAddr - addr)
                               return (I.BEQ s1 s2 relativeAddr) of
                        Just i -> Right [i]
                        Nothing -> Left $ "Label " ++ l ++ " not defined"
    LABEL _         -> Right []

I've omitted several cases for brevity, but you can see all the possible results here:

  • ADD always succeeds and produces an instruction
  • BEQL can either succeed or fail, depending on whether a label is found
  • LABEL always succeeds, even though it produces no actual instructions

This works as expected. The problem I now have is writing this function:

replaceLabels :: [Statement] -> Either String [I.Instruction]

replaceLabels takes a list of statements, and runs stmtToInstruction on each one. The addr argument to stmtToInstruction must be the length of the [Instruction] accumulated so far. The output may either be a Left String, if one of the label references was invalid, or a Right [I.Instruction], if there were no errors.

mapM :: Monad m => (a -> m b) -> [a] -> m [b] gets us some of the way there, but provides no way to inject the current address into the (a -> m b) function. How do I make this work?

3
You can do this in one pass using laziness - tie the knot using a map.Alec
@Alec can you point to me to any resources on this approach?rossng
foldM :: (Foldable t, Monad m) => (b -> a -> m b) -> b -> t a -> m b looks closer to me: hackage.haskell.org/package/base-4.10.0.0/docs/…Jason Hu

3 Answers

4
votes

You're right: the StateT monad transformer will do the trick:

imapM :: (Traversable t, Monad m)
          => (Int -> a -> m b) -> t a -> m (t b)
imapM f = flip runStateT 0 .
  mapM (\a ->
    do
      count <- get
      put $! count + 1
      f count a)

But writing the specialized version for lists might be better:

itraverse :: Applicative f
          => (Int -> a -> f b) -> [a] -> f [b]
itraverse f = go 0 where
  go !_ [] = pure []
  go !count (x:xs) = (:) <$> f count x <*> go (count + 1) xs
1
votes

I've implemented a recursive solution that I'm sure is very inefficient. I'd still be interested to see the 'proper' way of doing this.

replaceLabels :: [Statement] -> Either String [I.Instruction]
replaceLabels [] = Right []
replaceLabels stmts@(s:ss) = replaceLabels' labels stmts 0
    where labels = process stmts

replaceLabels' :: [(String, Int)] -> [Statement] -> Int -> Either String [I.Instruction]
replaceLabels' _ [] _ = Right []
replaceLabels' labels (s:ss) addr = do
                                instructions <- stmtToInstruction addr labels s
                                restInstructions <- replaceLabels' labels ss (addr + length instructions)
                                return (instructions ++ restInstructions)
0
votes

I would start by changing

stmtToInstruction :: Int -> [(String, Int)] -> Statement -> Either String [I.Instruction]

into

stmtToInstruction :: [(String, Int)] -> Statement -> Either String (Int -> [I.Instruction])

That is, moving the function that takes the address into the Right branch of the Either. The reason is that label reference errors seem to be independent of addresses, so it's better to handle reference errors first and then worry about the address stuff in isolation.

This function resolves the references:

resolveRefs :: [(String,Int)] -> [Statement] -> Either String [Int -> [Instruction]]
resolveRefs environment = traverse (stmtToInstruction environment)

(traverse is equivalent to mapM but it only requires an Applicative constraint. They are different functions merely for historical reasons.)

Ok, after having handled the errors, lets now focus on the [Int -> [Instruction]] list. It seems that we have to map over it from the left while carrying an accumulated address that we must supply to each function. The mapAccumL function is perfect for this:

resolveAddrs ::  [Int -> [Instruction]] -> [Instruction]
resolveAddrs funcs = mconcat . snd $ accumulate funcs
    where
    accumulate :: [Int -> [Instruction]] -> (Int,[[Instruction]])
    accumulate = mapAccumL step 0
    step address func = let is = func address in (address + length is,is)