56
votes

I'm confused about how to choose between data family and type family. The wiki page on TypeFamilies goes into a lot of detail. Occasionally it informally refers to Haskell's data family as a "type family" in prose, but of course there is also type family in Haskell.

There is a simple example that shows where two versions of code are shown, differing only on whether a data family or a type family is being declared:

-- BAD: f is too ambiguous, due to non-injectivity
-- type family F a
 
-- OK
data family F a 

f :: F a -> F a
f = undefined
 
g :: F Int -> F Int
g x = f x

type and data here have the same meaning, but the type family version fails to type-check, while the data family version is fine, because data family "creates new types and are therefore injective" (says the wiki page).

My takeaway from all of this is "try data family for simple cases, and, if it isn't powerful enough, then try type family". Which is fine, but I'd like to understand it better. Is there a Venn diagram or a decision tree I can follow to distinguish when to use which?

2

2 Answers

67
votes

(Boosting useful information from comments into an answer.)

Standalone vs In-Class declaration

Two syntactically different ways to declare a type family and/or data family, that are semantically equivalent:

standalone:

type family Foo
data family Bar

or as part of a typeclass:

class C where
   type Foo 
   data Bar 

both declare a type family, but inside a typeclass the family part is implied by the class context, so GHC/Haskell abbreviates the declaration.

"New type" vs "Type Synonym" / "Type Alias"

data family F creates a new type, similar to how data F = ... creates a new type.

type family F does not create a new type, similar to how type F = Bar Baz doesn't create a new type (it just creates an alias/synonym to an existing type).

Example of non-injectivity of type family

An example (slightly modified) from Data.MonoTraversable.Element:

import Data.ByteString as S
import Data.ByteString.Lazy as L

-- Declare a family of type synonyms, called `Element` 
-- `Element` has kind `* -> *`; it takes one parameter, which we call `container`
type family Element container

-- ByteString is a container for Word8, so...
-- The Element of a `S.ByteString` is a `Word8`
type instance Element S.ByteString = Word8 

-- and the Element of a `L.ByteString` is also `Word8`
type instance Element L.ByteString = Word8  

In a type family, the right-side of equations Word8 names an existing type; the things are the left-side creates new synonyms: Element S.ByteString and Element L.ByteString

Having a synonym means we can interchange Element Data.ByteString with Word8:

-- `w` is a Word8....
>let w = 0::Word8

-- ... and also an `Element L.ByteString`
>:t w :: Element L.ByteString
w :: Element L.ByteString :: Word8

-- ... and also an `Element S.ByteString`
>:t w :: Element S.ByteString
w :: Element S.ByteString :: Word8

-- but not an `Int`
>:t w :: Int
Couldn't match expected type `Int' with actual type `Word8'

These type synonyms are "non-injective" ("one-way"), and therefore non-invertible.

-- As before, `Word8` matches `Element L.ByteString` ...
>(0::Word8)::(Element L.ByteString)

-- .. but  GHC can't infer which `a` is the `Element` of (`L.ByteString` or `S.ByteString` ?):

>(w)::(Element a)
Couldn't match expected type `Element a'
            with actual type `Element a0'
NB: `Element' is a type function, and may not be injective
The type variable `a0' is ambiguous

Worse, GHC can't even solve non-ambiguous cases!:

type instance Element Int = Bool
> True::(Element a)
> NB: `Element' is a type function, and may not be injective

Note the use of "may not be"! I think GHC is being conservative, and refusing to check whether the Element truly is injective. (Perhaps because a programmer could add another type instance later, after importing a pre-compiled module, adding ambiguity.

Example of injectivity of data family

In contrast: In a data family, each right-hand side contains a unique constructor , so the definitions are injective ("reversible") equations.

-- Declare a list-like data family
data family XList a

-- Declare a list-like instance for Char
data instance XList Char = XCons Char (XList Char) | XNil

-- Declare a number-like instance for ()
data instance XList () = XListUnit Int

-- ERROR: "Multiple declarations of `XListUnit'"
data instance XList () = XListUnit Bool
-- (Note: GHCI accepts this; the new declaration just replaces the previous one.)

With data family, seeing the constructor name on the right (XCons, or XListUnit) is enough to let the type-inferencer know we must be working with XList () not an XList Char. Since constructor names are unique, these defintions are injective/reversible.

If type "just" declares a synonym, why is it semantically useful?

Usually, type synonyms are just abbreviations, but type family synonyms have added power: They can make a simple type (kind *) become a synonym of a "type with kind * -> * applied to an argument":

type instance F A = B

makes B match F a. This is used, for example, in Data.MonoTraversable to make a simple type Word8 match functions of the type Element a -> a (Element is defined above).

For example, (a bit silly), suppose we have a version of const that only works with "related" types:

> class Const a where constE :: (Element a) -> a -> (Element a)
> instance Const S.ByteString where constE = const

> constE (0::Word8) undefined
ERROR: Couldn't match expected type `Word8' with actual type `Element a0'

-- By providing a match `a = S.ByteString`, `Word8` matches `(Element S.ByteString)`
> constE (0::Word8) (undefined::S.ByteString)  
0

-- impossible, since `Char` cannot match `Element a` under our current definitions.
> constE 'x' undefined 
35
votes

I don't think any decision tree or Venn diagram will exist because the applications for type and data families are pretty wide.

Generally you've already highlighted the key design differences and I would agree with your takeaway to first see if you can get away with data family.

For me the key point is that each instance of a data family creates a new type, which does substantially limit the power because you can't do what is often the most natural thing and make an existing type be the instance.

For example the GMapKey example on the Haskell wiki page on "indexed types" is a reasonably natural fit for data families:

class GMapKey k where
  data GMap k :: * -> *
  empty       :: GMap k v
  lookup      :: k -> GMap k v -> Maybe v
  insert      :: k -> v -> GMap k v -> GMap k v

The key type of the map k is the argument to the data family and the actual map type is the result of the data family (GMap k) . As a user of a GMapKey instance you're probably quite happy for the GMap k type to be abstract to you and just manipulate it via the generic map operations in the type class.

In contrast the Collects example on the same wiki page is sort of the opposite:

class Collects ce where
  type Elem ce
  empty  :: ce
  insert :: Elem ce -> ce -> ce
  member :: Elem ce -> ce -> Bool
  toList :: ce -> [Elem ce]

The argument type is the collection and the result type is the element of the collection. In general a user is going to want to operate on those elements directly using the normal operations on that type. For example the collection might be IntSet and the element might be Int. Having the Int be wrapped up in some other type would be quite inconvenient.

Note - these two examples are with type classes and therefore don't need the family keyword as declaring a type inside a type class implies it must be a family. Exactly the same considerations apply as for standalone families though, it's just a question of how the abstraction is organised.