differences: GADT, data family, data family that is a GADT

Question

What/why are the differences between those three? Is a GADT (and regular data types) just a shorthand for a data family? Specifically what's the difference between:

data GADT a  where
  MkGADT :: Int -> GADT Int

data family FGADT a
data instance FGADT a  where             -- note not FGADT Int
  MkFGADT :: Int -> FGADT Int

data family DF a
data instance DF Int  where              -- using GADT syntax, but not a GADT
  MkDF :: Int -> DF Int

(Are those examples over-simplified, so I'm not seeing the subtleties of the differences?)

Data families are extensible, but GADTs are not. OTOH data family instances must not overlap. So I couldn't declare another instance/any other constructors for FGADT; just like I can't declare any other constructors for GADT. I can declare other instances for DF.

With pattern matching on those constructors, the rhs of the equation does 'know' that the payload is Int.

For class instances (I was surprised to find) I can write overlapping instances to consume GADTs:

instance C (GADT a) ...
instance {-# OVERLAPPING #-} C (GADT Int) ...

and similarly for (FGADT a), (FGADT Int). But not for (DF a): it must be for (DF Int) -- that makes sense; there's no data instance DF a, and if there were it would overlap.

ADDIT: to clarify @kabuhr's answer (thank you)

contrary to what I think you're claiming in part of your question, for a plain data family, matching on a constructor does not perform any inference

These types are tricky, so I expect I'd need explicit signatures to work with them. In that case the plain data family is easiest

inferDF (MkDF x) = x                 -- works without a signature

The inferred type inferDF :: DF Int -> Int makes sense. Giving it a signature inferDF :: DF a -> a doesn't make sense: there is no declaration for a data instance DF a .... Similarly with foodouble :: Foo Char a -> a there is no data instance Foo Char a ....

GADTs are awkward, I already know. So neither of these work without an explicit signature

inferGADT (MkGADT x) = x
inferFGADT (MkFGADT x) = x

Mysterious "untouchable" message, as you say. What I meant in my "matching on those constructors" comment was: the compiler 'knows' on rhs of an equation that the payload is Int (for all three constructors), so you'd better get any signatures consistent with that.

Then I'm thinking data GADT a where ... is as if data instance GADT a where .... I can give a signature inferGADT :: GADT a -> a or inferGADT :: GADT Int -> Int (likewise for inferFGADT). That makes sense: there is a data instance GADT a ... or I can give a signature at a more specific type.

So in some ways data families are generalisations of GADTs. I also see as you say

So, in some ways, GADTs are generalizations of data families.

Hmm. (The reason behind the question is that GHC Haskell has got to the stage of feature bloat: there's too many similar-but-different extensions. I was trying to prune it down to a smaller number of underlying abstractions. Then @HTNW's approach of explaining in terms of yet further extensions is opposite to what would help a learner. IMO existentials in data types should be chucked out: use GADTs instead. PatternSynonyms should be explained in terms of data types and mapping functions between them, not the other way round. Oh, and there's some DataKinds stuff, which I skipped over on first reading.)

A data family is not much more than a way to group together several newtype/data declarations. In fact, you can almost think of a data family as an injective open type family (where each data instance corresponds to a data decl and a type instance to that decl). That analogy breaks down in two places (that I can think of): data family type constructors can be partially applied and data family type constructors have stronger typing rules (MyDataFamily a ~ g b implies MyDataFamily ~ g and a ~ b, while MyInjectiveTyFam a ~ MyInjectiveTyFam b implies a ~ b). — Alec

K. A. Buhr K. A. Buhr · Accepted Answer · 2018-09-17T15:00:35

As a start, you should think of a data family as a collection of independent ADTs that happen to be indexed by a type, while a GADT is a single data type with an inferrable type parameter where constraints on that parameter (typically, equality constraints like a ~ Int) can be brought into scope by pattern matching.

This means that the biggest difference is that, contrary to what I think you're claiming in part of your question, for a plain data family, matching on a constructor does not perform any inference on the type parameter. In particular, this typechecks:

inferGADT :: GADT a -> a
inferGADT (MkGADT n) = n

but this does not:

inferDF :: DF a -> a
inferDF (MkDF n) = n

and without type signatures, the first would fail to type check (with a mysterious "untouchable" message) while the second would be inferred as DF Int -> Int.

The situation becomes quite a bit more confusing for something like your FGADT type that combines data families with GADTs, and I confess I haven't really thought about how this works in detail. But, as an interesting example, consider:

data family Foo a b
data instance Foo Int a where
  Bar :: Double -> Foo Int Double
  Baz :: String -> Foo Int String
data instance Foo Char Double where
  Quux :: Double -> Foo Char Double
data instance Foo Char String where
  Zlorf :: String -> Foo Char String

In this case, Foo Int a is a GADT with an inferrable a parameter:

fooint :: Foo Int a -> a
fooint (Bar x) = x + 1.0
fooint (Baz x) = x ++ "ish"

but Foo Char a is just a collection of separate ADTs, so this won't typecheck:

foodouble :: Foo Char a -> a
foodouble (Quux x) = x

for the same reason inferDF won't typecheck above.

Now, getting back to your plain DF and GADT types, you can largely emulate DFs just using GADTs. For example, if you have a DF:

data family MyDF a
data instance MyDF Int where
  IntLit :: Int -> MyDF Int
  IntAdd :: MyDF Int -> MyDF Int -> MyDF Int
data instance MyDF Bool where
  Positive :: MyDF Int -> MyDF Bool

you can write it as a GADT just by writing separate blocks of constructors:

data MyGADT a where
  -- MyGADT Int
  IntLit' :: Int -> MyGADT Int
  IntAdd' :: MyGADT Int -> MyGADT Int -> MyGADT Int
  -- MyGADT Bool
  Positive' :: MyGADT Int -> MyGADT Bool

So, in some ways, GADTs are generalizations of data families. However, a major use case for data families is defining associated data types for classes:

class MyClass a where
  data family MyRep a
instance MyClass Int where
  data instance MyRep Int = ...
instance MyClass String where
  data instance MyRep String = ...

where the "open" nature of data families is needed (and where the pattern-based inference methods of GADTs aren't helpful).

differences: GADT, data family, data family that is a GADT

2 Answers