Haskell is pulling a trick here. IO both is and isn't pure, depending on how you look at it.
On the "IO is pure" side, you're fallen into the very common error of thinking of a function returning an IO DBThing
as of it were returning a DBThing
. When someone claims that a function with type Stuff -> IO DBThing
is pure they are not saying that you can feed it the same Stuff
and always get the same DBThing
; as you correctly note that is impossible, and also not very useful! What they're saving is that given particular Stuff
you'll always get back the same IO DBThing
.
You actually can't get a DBThing
out of an IO DBThing
at all, so Haskell don't ever have to worry about the database containing different values (or being unavailable) at different times. All you can do with an IO DBThing
is combine it with something else that needs a DBThing and produces some other kind of IO thing
; the result of such a combination is an IO thing
.
What Haskell is doing here is building up a correspondence between manipulation of pure Haskell values and changes that would happen out in the world outside the program. There are things you can do with some ordinary pure values that don't make any sense with impure operations like altering the state of a database. So using the correspondence between IO
values and the outside world, Haskell simply doesn't provide you with any operations on IO
values that would correspond to things that don't make sense in the real world.
There are several ways to explain how you're "purely" manipulating the real world. One is to say that IO
is just like a state monad, only the state being threaded through is the entire world outside your program;= (so your Stuff -> IO DBThing
function really has an extra hidden argument that receives the world, and actually returns a DBThing
along with another world; it's always called with different worlds, and that's why it can return different DBThing
values even when called with the same Stuff
). Another explanation is that an IO DBThing
value is itself an imperative program; your Haskell program is a totally pure function doing no IO, which returns an impure program that does IO, and the Haskell runtime system (impurely) executes the program it returns.
But really these are both simply metaphors. The point is that the IO
value simply has a very limited interface which doesn't allow you to do anything that doesn't make sense as a real world action.
Note that the concept of monad hasn't actually come into this. Haskell's IO system really doesn't depend on monads; Monad
is just a convenient interface which is sufficiently limited that if you're only using the generic monad interface you also can't break the IO limitations (even if you don't know your monad is actually IO). Since the Monad
interface is also interesting enough to write a lot of useful programs, the fact that IO
forms a monad allows a lot of code that's useful on other types to be generically reused on IO
.
Does this mean you actually get to write pure IO code? Not really. This is the "of course IO isn't pure" side of the coin. When you're using the fancy "combining IO functions together" you still have to think about your program executing steps one after the other (or in parallel), affecting and being affected by outside conditions and systems; in short exactly the same kind of reasoning you have to use to write IO code in an imperative language (only with a nicer type system than most of them). Making IO pure doesn't really help you banish impurity from the way you have to think about your code.
So what's the point? Well for one, it gets us a compiler-enforced demarcation of code that can do IO and code that can't. If there's no IO
tag on the type then impure IO isn't involved. That would be useful in any language just on its own. And the compiler knows this too; Haskell compilers can apply optimizations to non-IO code that would be invalid in most other languages because it's often impossible to know that a given section of code doesn't have side effects (unless you can see the full implementation of everything the code calls, transitively).
Also, because IO is pure, code analysis tools (including your brain) don't have to treat IO-code specially. If you can pick out a code transformation that would be valid on pure code with the same structure as the IO code, you can do it on the IO code. Compilers make use of this. Many transformations are ruled out by the structure that IO code must use (in order to stay within the bounds of things that have a sensible correspondence to things in the outside world) but they would also be ruled out by any pure code that used the same structure; the careful construction of the IO interface makes "execution order dependency" look like ordinary "data dependency", so you can just use the rules of data dependency to determine the rules of using IO.
f :: IO Int
is not a function, it is a value.lookInDatabase :: IO DBThing
doesn't return a value: it is a value. It is always the same value, so it is pure. – Rein Henrichs