data family instances: newtype, data

Question

With Haskell 98 decls, the whole datatype must be either newtype or data. But data families can have a mix of newtype instance, data instance. Does that mean that being newtype is a property of the data constructor rather than the datatype? Could there be a Haskell with something like:

data Foo a = MkFoo1 Int a
           | MkFoo2 Bool a
           | newtype MkFoo3 a

I know I can't write the following, but why/what goes wrong?:

data family Bar a

newtype instance Bar (Maybe Int)  = MkBar1 (Maybe Int)
            -- MkBar1 :: (Maybe Int) -> Bar (Maybe Int), see below

newtype instance Bar [Char]  = MkBar2 [Char]

data instance Bar [Bool]  where
  MkBar3 :: Int -> Bool -> Bar [Bool]
  -- can't be a newtype because of the existential Int

-- we're OK up to here mixing newtypes and data/GADT

data instance Bar [a]  where
  MkBar4 :: Num a => a -> Bar [a]
  -- can't be a newtype because of the Num a =>

I can't write that because the instance head Bar [a] overlaps the two heads for MkBar2, MkBar3. Then I could fix that by moving those two constructor decls inside the where ... for Bar [a]. But then MkBar2 becomes a GADT (because it's result type is not Bar [a]), so can't be a newtype.

Then is being a newtype a property of the result type, rather than of the constructor? But consider the type inferred for newtype instance MkBar1 above. I can't write a top-level newtype with the same type

newtype Baz a  where
  MkBaz :: (Maybe Int) -> Baz (Maybe Int)

-- Error: A newtype constructor must have a return type of form T a1 ... an

Huh? MkBar1 is a newtype constructor whose type is not of that form.

If possible, please explain without diving into talk of roles: I've tried to understand them; it just makes my head hurt. Talk in terms of constructing and pattern-matching on those constructors.

What you’re saying doesn’t make sense. Different instances of a data family are still different types, so the compiler can still statically determine a value’s representation. The whole point of a newtype is to use the same runtime representation as some other type (that is, with no wrappers), but if there are other cases to the type, that clearly isn’t possible. — Alexis King
"Different instances of a data family are still different types," Hmm? They're all of type Bar a. "The whole point of a newtype ..., but if there are other cases to the type, that clearly isn’t possible. " Bar (Maybe Int), Bar [Char] are other cases of to the type, and newtype is possible -- until it gets tangled up with Bar [a]. — AntC
If you have a type family F, then surely you would not say that F Int and F Bool must be the same type, hm? The point is that GHC tracks that difference statically. It does not statically track the difference between different constructors (which is often frustrating to novice Haskellers, who regularly ask questions like “how can I have a type that is T but restricted to exclude data constructor C,” to which the only answer is “define a new data type”). — Alexis King
@AntC "They're all of type Bar a." is not precise. Try it yourself and see: make a data family with at least two instances, then write foo :: Bar a and try to define foo without using undefined. You'll find it's not possible! — Daniel Wagner

Alec Alec · Accepted Answer · 2018-09-20T04:37:28

You can think of data families as stronger versions of (injective and open) type families. As you can see from this question I asked a while back, data families can almost be faked with injective type families.

Does that mean that being newtype is a property of the data constructor rather than the datatype? Could there be a Haskell with something like:
data Foo a = MkFoo1 Int a
           | MkFoo2 Bool a
           | newtype MkFoo3 a

No. Being a newtype is definitively a property of the type constructor, not of the data constructor. A data family, quite like a type family, has two levels of type constructors:

the type constructor data/type family FamTyCon a
the type constructors for the instances data/type instance FamInstTyCon a = ...

Different instances of the same data/type family constructor are still fundamentally different types - they happen to be unified under one type constructor (and that type constructor is going to be injective and generative - see the linked question for more on that).

By the type family analogy, you wouldn't expect to be able to pattern match on something of type TyFam a, right? Because you could have type instance TyFam Int = Bool and type instance TyFam () = Int, and you wouldn't be able to statically know if you were looking at a Bool or an Int!

data family instances: newtype, data

2 Answers