0
votes

With Haskell 98 decls, the whole datatype must be either newtype or data. But data families can have a mix of newtype instance, data instance. Does that mean that being newtype is a property of the data constructor rather than the datatype? Could there be a Haskell with something like:

data Foo a = MkFoo1 Int a
           | MkFoo2 Bool a
           | newtype MkFoo3 a

I know I can't write the following, but why/what goes wrong?:

data family Bar a

newtype instance Bar (Maybe Int)  = MkBar1 (Maybe Int)
            -- MkBar1 :: (Maybe Int) -> Bar (Maybe Int), see below

newtype instance Bar [Char]  = MkBar2 [Char]

data instance Bar [Bool]  where
  MkBar3 :: Int -> Bool -> Bar [Bool]
  -- can't be a newtype because of the existential Int

-- we're OK up to here mixing newtypes and data/GADT

data instance Bar [a]  where
  MkBar4 :: Num a => a -> Bar [a]
  -- can't be a newtype because of the Num a =>

I can't write that because the instance head Bar [a] overlaps the two heads for MkBar2, MkBar3. Then I could fix that by moving those two constructor decls inside the where ... for Bar [a]. But then MkBar2 becomes a GADT (because it's result type is not Bar [a]), so can't be a newtype.

Then is being a newtype a property of the result type, rather than of the constructor? But consider the type inferred for newtype instance MkBar1 above. I can't write a top-level newtype with the same type

newtype Baz a  where
  MkBaz :: (Maybe Int) -> Baz (Maybe Int)

-- Error: A newtype constructor must have a return type of form T a1 ... an

Huh? MkBar1 is a newtype constructor whose type is not of that form.

If possible, please explain without diving into talk of roles: I've tried to understand them; it just makes my head hurt. Talk in terms of constructing and pattern-matching on those constructors.

2
What you’re saying doesn’t make sense. Different instances of a data family are still different types, so the compiler can still statically determine a value’s representation. The whole point of a newtype is to use the same runtime representation as some other type (that is, with no wrappers), but if there are other cases to the type, that clearly isn’t possible. - Alexis King
"Different instances of a data family are still different types," Hmm? They're all of type Bar a. "The whole point of a newtype ..., but if there are other cases to the type, that clearly isn’t possible. " Bar (Maybe Int), Bar [Char] are other cases of to the type, and newtype is possible -- until it gets tangled up with Bar [a]. - AntC
If you have a type family F, then surely you would not say that F Int and F Bool must be the same type, hm? The point is that GHC tracks that difference statically. It does not statically track the difference between different constructors (which is often frustrating to novice Haskellers, who regularly ask questions like “how can I have a type that is T but restricted to exclude data constructor C,” to which the only answer is “define a new data type”). - Alexis King
@AntC "They're all of type Bar a." is not precise. Try it yourself and see: make a data family with at least two instances, then write foo :: Bar a and try to define foo without using undefined. You'll find it's not possible! - Daniel Wagner

2 Answers

1
votes

You can think of data families as stronger versions of (injective and open) type families. As you can see from this question I asked a while back, data families can almost be faked with injective type families.

Does that mean that being newtype is a property of the data constructor rather than the datatype? Could there be a Haskell with something like:

data Foo a = MkFoo1 Int a
           | MkFoo2 Bool a
           | newtype MkFoo3 a

No. Being a newtype is definitively a property of the type constructor, not of the data constructor. A data family, quite like a type family, has two levels of type constructors:

  • the type constructor data/type family FamTyCon a
  • the type constructors for the instances data/type instance FamInstTyCon a = ...

Different instances of the same data/type family constructor are still fundamentally different types - they happen to be unified under one type constructor (and that type constructor is going to be injective and generative - see the linked question for more on that).

By the type family analogy, you wouldn't expect to be able to pattern match on something of type TyFam a, right? Because you could have type instance TyFam Int = Bool and type instance TyFam () = Int, and you wouldn't be able to statically know if you were looking at a Bool or an Int!

0
votes

These are equivalent:

newtype          NT a b  = MkNT (Int, b)

data family DF a b
newtype instance DF a b  = MkDF (Int, b)

-- inferred MkNT :: (Int, b) -> NT a b
--          MkDF :: (Int, b) -> DF a b

Furthermore you can't declare any other instance/constructor within family DF, because their instance heads would overlap DF a b.

The error message in the q re standalone newtype Baz is misleading

-- Error: A newtype constructor must have a return type of form T a1 ... an

(and presumably dates from before there were data families). A newtype constructor must have a return type exactly as the instance head (modulo alpha renaming if it's using GADT syntax). For a standalone newtype, "instance head" means the newtype head.

For non-newtype data instances, the various constructors might have return types strictly more specific than the instance head (that makes them GADTs).

The type declared in a data/newtype instance head (in the literature variously called a 'type scheme', 'monotype' -- because no constraints, 'Function-free type' -- in the 2008 paper 'Type Checking with Open Type Functions') is the principal type that all the constructors' return types must unify with. (There might not be any constructors with exactly that return type.)

If your instance be a newtype, the rules are much tighter: there can be only one constructor, and its return type must be exactly the principal type. So to answer the original q

Does that mean that being newtype is a property of the data constructor rather than the datatype? ... but ... Then is being a newtype a property of the result type, rather than of the constructor?

No, not of the constructor; yes more like of the result: being a newtype is the property of a data family instance/its principal type. It's just that for standalone newtypes, there can in effect be only one instance, and its principal type is the most general type for that type constructor. Derivatively, we can say the newtype's principal type/return type uniquely identifies a data constructor. That's crucial for type safety, because values of that type share their representation with the type inside the newtype's data constructor, i.e. with no wrapper -- as @AlexisKing's comment points out. Then pattern matching doesn't need to go looking for the constructor: the match is irrefutable/the constructor is virtual.

It strikes me that in the interests of more precise typing/better documentation, you might want to declare a newtype as a data family with only one instance, whose head is more specific than what you must put for a standalone newtype.