Why are monad transformers different to stacking monads?

Question

In many cases, it isn't clear to me what is to be gained by combining two monads with a transformer rather than using two separate monads. Obviously, using two separate monads is a hassle and can involve do notation inside do notation, but are there cases where it just isn't expressive enough?

One case seems to be StateT on List: combining monads doesn't get you the right type, and if you do obtain the right type via a stack of monads like Bar (where Bar a = (Reader r (List (Writer w (Identity a))), it doesn't do the right thing.

But I'd like a more general and technical understanding of exactly what monad transformers are bringing to the table, when they are and aren't necessary, and why.

To make this question a little more focused:

What is an actual example of a monad with no corresponding transformer (this would help illustrate what transformers can do that just stacking monads can't).
Are StateT and ContT the only transformers that give a type not equivalent to the composition of them with m, for an underlying monad m (regardless of which order they're composed.)

(I'm not interested in particular implementation details as regards different choices of libraries, but rather the general (and probably Haskell independent) question of what monad transformers/morphisms are adding as an alternative to combining effects by stacking a bunch of monadic type constructors.)

(To give a little context, I'm a linguist who's doing a project to enrich Montague grammar - simply typed lambda calculus for composing word meanings into sentences - with a monad transformer stack. It would be really helpful to understand whether transformers are actually doing anything useful for me.)

Thanks,

Reuben

What do you mean by "using two separate monads"? Can you give an example? — ErikR
Well - the List monad is kind of special because any expression which is a list is also a value in the List monad. Try to "stack" Writer and IO - Writer w (IO a) is a value in the Writer monad returning an IO a - and that's different from WriterT w IO a - which is a value in the WriterT w IO monad returning an a. — ErikR
hmm, IO is also maybe not the best example. What about (Writer w (Maybe a)) vs (MaybeT (Writer w) a)? These are, I think, the same type, mutatis mutandis, and act the same too. — Reuben
A monad transformer is simply a type t for which t m is a Monad for any Monad m. However, it is not true, in general, that the composition of two monads is a monad - because of this, there aren't many abstract operations (if any at all) that can be defined in terms of the composition of two monads (and without abstraction, programming in Haskell with monads would be very tedious). It is simply a coincidence that most of the transformers commonly encountered compositions of monads (i.e. WriterT x m is Compose m (x,)). — user2407038
@user2407038 I'm not sure it's a coincidence that most of the transformers end up looking like composition. The easiest way to make one is roughly to see that for some particular monad its composition with any other monad will be a monad. The idea that we can do that with particular monads (with implementation specific to that monad) but not for all is basically where transformers came from. You're right that it's not necessary, but it's not a coincidence. — Ben

ErikR ErikR · Accepted Answer · 2016-08-02T03:11:37

To answer you question about the difference between Writer w (Maybe a) vs MaybeT (Writer w) a, let's start by taking a look at the definitions:

newtype WriterT w m a = WriterT { runWriterT :: m (a, w) }
type Writer w = WriterT w Identity

newtype MaybeT m a = MaybeT { runMaybeT :: m (Maybe a) }

Using ~~ to mean "structurally similar to" we have:

Writer w (Maybe a)  == WriterT w Identity (Maybe a)
                    ~~ Identity (Maybe a, w)
                    ~~ (Maybe a, w)

MaybeT (Writer w) a ~~ (Writer w) (Maybe a)
                    == Writer w (Maybe a)
                    ... same derivation as above ...
                    ~~ (Maybe a, w)

So in a sense you are correct -- structurally both Writer w (Maybe a) and MaybeT (Writer w) a are the same - both are essentially just a pair of a Maybe value and a w.

The difference is how we treat them as monadic values. The return and >>= class functions do very different things depending on which monad they are part of.

Let's consider the pair (Just 3, []::[String]). Using the association we have derived above here's how that pair would be expressed in both monads:

three_W :: Writer String (Maybe Int)
three_W = return (Just 3)

three_M :: MaybeT (Writer String) Int
three_M = return 3

And here is how we would construct a the pair (Nothing, []):

nutin_W :: Writer String (Maybe Int)
nutin_W = return Nothing

nutin_M :: MaybeT (Writer String) Int
nutin_M = MaybeT (return Nothing)   -- could also use mzero

Now consider this function on pairs:

add1 :: (Maybe Int, String) -> (Maybe Int, String)
add1 (Nothing, w) = (Nothing w)
add1 (Just x, w)  = (Just (x+1), w)

and let's see how we would implement it in the two different monads:

add1_W :: Writer String (Maybe Int) -> Writer String (Maybe Int)
add1_W e = do x <- e
             case x of
               Nothing -> return Nothing
               Just y  -> return (Just (y+1))

add1_M :: MaybeT (Writer String) Int -> MaybeT (Writer String) Int
add1_M e = do x <- e; return (e+1)
  -- also could use: fmap (+1) e

In general you'll see that the code in the MaybeT monad is more concise.

Moreover, semantically the two monads are very different...

MaybeT (Writer w) a is a Writer-action which can fail, and the failure is automatically handled for you. Writer w (Maybe a) is just a Writer action which returns a Maybe. Nothing special happens if that Maybe value turns out to be Nothing. This is exemplified in the add1_W function where we had to perform a case analysis on x.

Another reason to prefer the MaybeT approach is that we can write code which is generic over any monad stack. For instance, the function:

square x = do tell ("computing the square of " ++ show x)
              return (x*x)

can be used unchanged in any monad stack which has a Writer String, e.g.:

WriterT String IO
ReaderT (WriterT String Maybe)
MaybeT (Writer String)
StateT (WriterT String (ReaderT Char IO))
...

But the return value of square does not type check against Writer String (Maybe Int) because square does not return a Maybe.

When you code in Writer String (Maybe Int), you code explicitly reveals the structure of monad making it less generic. This definition of add1_W:

add1_W e = do x <- e 
              return $ do 
                y <- x 
                return $ y + 1

only works in a two-layer monad stack whereas a function like square works in a much more general setting.

Why are monad transformers different to stacking monads?

2 Answers