Writing an assembler in Haskell - mapM with state?

Question

I'm writing a very simple two-pass assembler in Haskell and I've come across a scenario that I don't yet have the experience to solve. I think the solution is likely to involve monad transformers, which I don't really understand.

The assembler parses the assembly code into a list of Statements, which are either instructions or labels. Some Statements may refer to labels. The assembler needs to convert the Statements into Instructions, which involves eliminating the labels and substituting the label references with an appropriate value.

I have written the first pass of the assembler, which produces a [(String, Int)] representing a map from labels to addresses. I have also written the following function for translating a Statement into an Instruction:

stmtToInstruction :: Int -> [(String, Int)] -> Statement -> Either String [I.Instruction]
stmtToInstruction addr labels stmt = case stmt of
    ADD d s1 s2     -> Right [I.ADD d s1 s2]
    BEQL s1 s2 l    -> case do label <- find (\e -> fst e == l) labels
                               let labelAddr = snd label
                               let relativeAddr = I.ImmS $ fromIntegral (labelAddr - addr)
                               return (I.BEQ s1 s2 relativeAddr) of
                        Just i -> Right [i]
                        Nothing -> Left $ "Label " ++ l ++ " not defined"
    LABEL _         -> Right []

I've omitted several cases for brevity, but you can see all the possible results here:

ADD always succeeds and produces an instruction
BEQL can either succeed or fail, depending on whether a label is found
LABEL always succeeds, even though it produces no actual instructions

This works as expected. The problem I now have is writing this function:

replaceLabels :: [Statement] -> Either String [I.Instruction]

replaceLabels takes a list of statements, and runs stmtToInstruction on each one. The addr argument to stmtToInstruction must be the length of the [Instruction] accumulated so far. The output may either be a Left String, if one of the label references was invalid, or a Right [I.Instruction], if there were no errors.

mapM :: Monad m => (a -> m b) -> [a] -> m [b] gets us some of the way there, but provides no way to inject the current address into the (a -> m b) function. How do I make this work?

You can do this in one pass using laziness - tie the knot using a map. — Alec
@Alec can you point to me to any resources on this approach? — rossng
foldM :: (Foldable t, Monad m) => (b -> a -> m b) -> b -> t a -> m b looks closer to me: hackage.haskell.org/package/base-4.10.0.0/docs/… — Jason Hu
bobkonf.de/2016/breitner-monaden.html is a resource, albeit in German. wall.org/~lewis/2013/10/15/asm-monad.html is an English one. — Joachim Breitner
Possible duplicate of Haskell labelling items in recursive function — freestyle

dfeuer dfeuer · Accepted Answer · 2017-11-19T00:10:27

You're right: the StateT monad transformer will do the trick:

imapM :: (Traversable t, Monad m)
          => (Int -> a -> m b) -> t a -> m (t b)
imapM f = flip runStateT 0 .
  mapM (\a ->
    do
      count <- get
      put $! count + 1
      f count a)

But writing the specialized version for lists might be better:

itraverse :: Applicative f
          => (Int -> a -> f b) -> [a] -> f [b]
itraverse f = go 0 where
  go !_ [] = pure []
  go !count (x:xs) = (:) <$> f count x <*> go (count + 1) xs

Writing an assembler in Haskell - mapM with state?

3 Answers