9
votes

In Scrap your boilerplate reloaded, the authors describe a new presentation of Scrap Your Boilerplate, which is supposed to be equivalent to the original.

However, one difference is that they assume a finite, closed set of "base" types, encoded with a GADT

data Type :: * -> * where
  Int :: Type Int
  List :: Type a -> Type [a]
  ...

In the original SYB, type-safe cast is used, implemented using the Typeable class.

My questions are:

  • What is the relationship between these two approaches?
  • Why was the GADT representation chosen for the "SYB Reloaded" presentation?
2
That sounds like a similar approach taken here with the universal Univ type, but I've only skimmed that paper.jberryman

2 Answers

5
votes

[I am one of the authors of the "SYB Reloaded" paper.]

TL;DR We really just used it because it seemed more beautiful to us. The class-based Typeable approach is more practical. The Spine view can be combined with the Typeable class and does not depend on the Type GADT.

The paper states this in its conclusions:

Our implementation handles the two central ingredients of generic programming differently from the original SYB paper: we use overloaded functions with explicit type arguments instead of overloaded functions based on a type-safe cast 1 or a class-based extensible scheme [20]; and we use the explicit spine view rather than a combinator-based approach. Both changes are independent of each other, and have been made with clarity in mind: we think that the structure of the SYB approach is more visible in our setting, and that the relations to PolyP and Generic Haskell become clearer. We have revealed that while the spine view is limited in the class of generic functions that can be written, it is applicable to a very large class of data types, including GADTs.

Our approach cannot be used easily as a library, because the encoding of overloaded functions using explicit type arguments requires the extensibility of the Type data type and of functions such as toSpine. One can, however, incorporate Spine into the SYB library while still using the techniques of the SYB papers to encode overloaded functions.

So, the choice of using a GADT for type representation is one we made mainly for clarity. As Don states in his answer, there are some obvious advantages in this representation, namely that it maintains static information about what type a type representation is for, and that it allows us to implement cast without any further magic, and in particular without the use of unsafeCoerce. Type-indexed functions can also be implemented directly by using pattern matching on the type, and without falling back to various combinators such as mkQ or extQ.

Fact is that I (and I think the co-authors) simply were not very fond of the Typeable class. (In fact, I'm still not, although it is finally becoming a bit more disciplined now in that GHC adds auto-deriving for Typeable, makes it kind-polymorphic, and will ultimately remove the possibility to define your own instances.) In addition, Typeable wasn't quite as established and widely known as it is perhaps now, so it seemed appealing to "explain" it by using the GADT encoding. And furthermore, this was the time when we were also thinking about adding open datatypes to Haskell, thereby alleviating the restriction that the GADT is closed.

So, to summarize: If you actually need dynamic type information only for a closed universe, I'd always go for the GADT, because you can use pattern matching to define type-indexed functions, and you do not have to rely on unsafeCoerce nor advanced compiler magic. If the universe is open, however, which is quite common, certainly for the generic programming setting, then the GADT approach might be instructive, but isn't practical, and using Typeable is the way to go.

However, as we also state in the conclusions of the paper, the choice of Type over Typeable isn't a prerequisite for the other choice we're making, namely to use the Spine view, which I think is more important and really the core of the paper.

The paper itself shows (in Section 8) a variation inspired by the "Scrap your Boilerplate with Class" paper, which uses a Spine view with a class constraint instead. But we can also do a more direct development, which I show in the following. For this, we'll use Typeable from Data.Typeable, but define our own Data class which, for simplicity, just contains the toSpine method:

class Typeable a => Data a where
  toSpine :: a -> Spine a

The Spine datatype now uses the Data constraint:

data Spine :: * -> * where
  Constr  :: a -> Spine a
  (:<>:)  :: (Data a) => Spine (a -> b) -> a -> Spine b

The function fromSpine is as trivial as with the other representation:

fromSpine :: Spine a -> a
fromSpine (Constr x) = x
fromSpine (c :<>: x) = fromSpine c x

Instances for Data are trivial for flat types such as Int:

instance Data Int where
  toSpine = Constr

And they're still entirely straightforward for structured types such as binary trees:

data Tree a = Empty | Node (Tree a) a (Tree a)

instance Data a => Data (Tree a) where
  toSpine Empty        = Constr Empty
  toSpine (Node l x r) = Constr Node :<>: l :<>: x :<>: r

The paper then goes on and defines various generic functions, such as mapQ. These definitions hardly change. We only get class constraints for Data a => where the paper has function arguments of Type a ->:

mapQ :: Query r -> Query [r]
mapQ q = mapQ' q . toSpine

mapQ' :: Query r -> (forall a. Spine a -> [r])
mapQ' q (Constr c) = []
mapQ' q (f :<>: x) = mapQ' q f ++ [q x]

Higher-level functions such as everything also just lose their explicit type arguments (and then actually look exactly the same as in original SYB):

everything :: (r -> r -> r) -> Query r -> Query r
everything op q x = foldl op (q x) (mapQ (everything op q) x)

As I said above, if we now want to define a generic sum function summing up all Int occurrences, we cannot pattern match anymore, but have to fall back to mkQ, but mkQ is defined purely in terms of Typeable and completely independent of Spine:

mkQ :: (Typeable a, Typeable b) => r -> (b -> r) -> a -> r
(r `mkQ` br) a = maybe r br (cast a)

And then (again exactly as in original SYB):

sum :: Query Int
sum = everything (+) sumQ

sumQ :: Query Int
sumQ = mkQ 0 id

For some of the stuff later in the paper (e.g., adding constructor information), a bit more work is needed, but it can all be done. So using Spine really does not depend on using Type at all.

5
votes

Well, obviously the Typeable use is open -- new variants can be added after the fact, and without modifying the original definitions.

The important change though is that in that TypeRep is untyped. That is, there is no connection between the runtime type , TypeRep, and the static type it encodes. With the GADT approach we can encode the mapping between a type a and its Type, given by the GADT Type a.

We thus bake in evidence for the type rep being statically linked to its origin type, and can write statically typed dynamic application (for example) using Type a as evidence that we have a runtime a.

In the older TypeRep case, we have no such evidence and it comes down to runtime string equality, and a coerce and hope for the best through fromDynamic.

Compare the signatures:

toDyn :: Typeable a => a -> TypeRep -> Dynamic

versus GADT style:

toDyn :: Type a => a -> Type a -> Dynamic

I can't fake my type evidence, and I can use that later when reconstructing things, to e.g. lookup the type class instances for a when all I have is a Type a.