They're defined differently because they do different things.
Take the reader monad. Start by thinking about what it means, not about how it works.
A computation in the reader monad is one that depends on an extra piece of information, the reader's "environment". So a Reader Env Int
is an Int
that depends on the environment (of type Env
); if I evaluate it with one environment I'll get one Int
value, and if I evaluate it with a different environment I'll get another Int
value. If I don't have an environment I can't know what value the Reader env Int
is.
Now, what kind of value will give me an Int
if I give it an Env
? A function of type Env -> Int
! So that generalises to e -> a
being a monad for each e
(with a
being the type parameter of the monad; (->) e
if you like the prefix notation).
Now lets think about the meaning of the writer monad. A computation in the writer monad produces a value, but it also produces an extra value "on the side": the "log" value. And when we bind together a series of monadic computations from in the writer monad, the log values will be combined (if we require the log type to be a monoid, then this guarantees log values can be combined with no other knowledge about what they are). So a Writer Log Int
is an Int
that also comes with value of type Log
.
That sounds a lot like simply a pair: (Log, Int)
. And that generalises to (w, a)
being a monad for each w
(with a
being the type parameter of the monad). The monoid constraint on w
that guarantees we can combine the log values also means that we have an obvious starting value (the identity element for the monoid: mempty
), so we don't need to provide anything to get a value out of a value in the writer monad.
The reasoning for the state monad to be s -> (a, s)
is actually pretty much a combination of the above; a State S Int
is an Int
that both depends on an S
value (as the reader depends on the environment) and also produces an S
value, where binding together a sequence of state computations should result in each one "seeing" the state produced by the previous one. A value that depends on a state value is a function of the state value; if the output comes "along with" a new state value then we need a pair.