(Rewritten for clarity)
There seem to be two parts to your question. One is implied and is asking what the algebraic interpretation of forall
is, and the other is asking about the cont/Yoneda transformation, which sdcvvc's answer already covered pretty well.
I'll try to address the algebraic interpretation of forall
for you. You mention that A -> B
is B^A
but I'd like to take that a step further and expand it out to B * B * B * ... * B
(|A|
times). Although we do have exponentiation as a notation for repeated multiplication like that, there's a more flexible notation, ∏
(uppercase Pi) representing arbitrary indexed products. There are two components to a Pi: the range of values we want to multiply over, and the expression that we're multiplying out. For example, at the value level, you might express the factorial function as fact i = ∏ [1..i] (λx -> x)
.
Going back to the world of types, we can view the exponentiation operator in the A -> B ~ B^A
correspondence as a Pi: B^A ~ ∏ A (λ_ -> B)
. This says that we're defining an A
-ary product of B
s, such that the B
s cannot depend on the particular A
we've chosen. Sure, it's equivalent to plain exponentiation, but it lets us move up to cases in which there is a dependence.
In the most general case, we get dependent types, like what you see in Agda or Coq: in Agda syntax, replicate : Bool -> ((n : Nat) -> Vec Bool n)
is one possible application of a Pi type, which could be expressed more explicitly as replicate : Bool -> ∏ Nat (Vec Bool)
, or further as replicate : ∏ Bool (λ_ -> ∏ Nat (Vec Bool))
.
Note that as you might expect from the underlying algebra, you can fuse both of the ∏
s in the definition of replicate
above into a single ∏
ranging over the cartesian product of the domains: ∏ Bool (\_ -> ∏ Nat (Vec Bool))
is equivalent to ∏ (Bool, Nat) (λ(_, n) -> Vec Bool n)
just like it would be at the "value level". This is simply uncurrying from the perspective of type theory.
I do realize your question was about polymorphism, so I'll stop going on about dependent types, but they are relevant: forall
in Haskell is roughly equivalent to a ∏
with a domain over the type (kind) of types, *
. Indeed, the function-like behavior of polymorphism can be observed directly in GHC core, which types them as capital lambdas (Λ). As such, a polymorphic type like forall a. a -> a
is actually just ∏ * (Λ a -> (a -> a))
(using the Λ notation now that we distinguish between types and values), which can be expanded out to the infinite product (Bool -> Bool, Int -> Int, () -> (), (Int -> Bool) -> (Int -> Bool), ...)
for every possible type. Instantiation of the type variable is simply projecting out the suitable element from the *
-ary product (or applying the type function).
Now, for the big piece I missed in my original version of this answer: parametricity. Parametricity can be described in several different ways, but none of the ones I know of (viewing types as relations, or (di)naturality in category theory) really has a very algebraic interpretation. For our purposes, though, it boils down to something fairly simple: you can't pattern-match on *
. I know that GHC lets you do that at the type level with type families, but you can only cover a finite chunk of *
when doing that, so there are necessarily always points at which your type family is undefined.
What this means, from the point of view of polymorphism, is that any type function F
we write in ∏ * F
must either be constant (i.e., completely ignore the type it was polymorphic over) or pass the type through unchanged. Thus, ∏ * (Λ _ -> B)
is valid because it ignores its argument, and corresponds to forall a. B
. The other case is something like ∏ * (Λ x -> Maybe x)
, which corresponds to forall a. Maybe a
, which doesn't ignore the type argument, but only "passes it through". As such, a ∏ A
that has an irrelevant domain A
(such as when A = *
) can be seen as more of an A
-ary indexed intersection (picking the common elements across all instantiations of the index), rather than a product.
Crucially, at the value level, the rules of parametricity prevent any funny behavior that might suggest the types are larger than they really are. Because we don't have typecase, we can't construct a value of type forall a. B
that does something different based on what a
was instantiated to. Thus, although the type is technically a function * -> B
, it is always a constant function, and is thus equivalent to a single value of B
. Using the ∏
interpretation, it is indeed equivalent to an infinite *
-ary product of B
s, but those B
values must always be identical, so the infinite product is effectively as big as a single B
.
Similarly, although ∏ * (Λ x -> (x -> x))
(a.k.a., forall a. a -> a
) is technically equivalent to an infinite product of functions, none of those functions can inspect the type, so all are constrained to only return their input value and not do any funny business like (+1) : Int -> Int
when instantiated to Int
. Because there is only one (assuming a total language) function that can't inspect the type of its argument but must return a value of that same type, the infinite product is thus just as large as a single value.
Now, about your direct question on (forall r . (a -> r) -> r) ~ a
. First, let's express your ~
operator more formally. It's really isomorphism, so we need two functions going back and forth, and an argument that they're inverses.
data Iso a b = Iso
{ to :: a -> b
, from :: b -> a
-- proof1 :: forall x. to (from x) == x
-- proof2 :: forall x. from (to x) == x
}
and now we express your original question in more formal terms. Your question amounts to constructing a term of the following (impredicative, so GHC has trouble with it, but we'll survive) type:
forall a. Iso (forall r. (a -> r) -> r) a
Which, using my earlier terminology, amounts to ∏ * (Λ a -> Iso (∏ * (Λ r -> ((a -> r) -> r))) a)
. Once again we have an infinite product that can't inspect its type argument. By handwaving, we can argue that the only possible values considering the parametricity rules (the other two proofs are respected automatically) for to
and from
are ($ id)
and flip id
.
If this feels unsatisfying, it's probably because the algebraic interpretation of forall
didn't really add anything to the proof. It's really just plain old type theory, but I hope I was able to provide something that feels a little less categorical than the Yoneda form of it. It's worth noting that we don't actually need to use parametricity to write proof1
and proof2
above, though. Parametricity only enters the picture when we want to state that ($ id)
and flip id
are our only options for to
and from
(which we can't prove in Agda or Coq, for that reason).
forall r. (r -> a, r) ~ a
andforall r s. (a -> (r, s)) -> (r, s) ~ a*a
. Also, Brent Yorgey did a series of blogposts about this: byorgey.wordpress.com/2011/02/24/enumerating-linear-inhabitants – Sjoerd Visscher