41
votes

Say I have a data type like so:

data NumCol = Empty |
              Single Int |
              Pair Int Int |
              Lots [Int]

Now I wish to filter out the elements matching a given constructor from a [NumCol]. I can write it for, say, Pair:

get_pairs :: [NumCol] -> [NumCol]
get_pairs = filter is_pair
    where is_pair (Pair _ _) = True
          is_pair _ = False

This works, but it's not generic. I have to write a separate function for is_single, is_lots, etc.

I wish instead I could write:

get_pairs = filter (== Pair)

But this only works for type constructors that take no arguments (i.e. Empty).

So the question is, how can I write a function that takes a value and a constructor, and returns whether the value matches the constructor?

3

3 Answers

33
votes

At least get_pairs itself can be defined relatively simply by using a list comprehension to filter instead:

get_pairs xs = [x | x@Pair {} <- xs]

For a more general solution of matching constructors, you can use prisms from the lens package:

{-# LANGUAGE TemplateHaskell #-}

import Control.Lens
import Control.Lens.Extras (is)

data NumCol = Empty |
              Single Int |
              Pair Int Int |
              Lots [Int]

-- Uses Template Haskell to create the Prisms _Empty, _Single, _Pair and _Lots
-- corresponding to your constructors
makePrisms ''NumCol

get_pairs :: [NumCol] -> [NumCol]
get_pairs = filter (is _Pair)
28
votes

Tags of tagged unions ought to be first-class values, and with a wee bit of effort, they are.

Jiggery-pokery alert:

{-# LANGUAGE GADTs, DataKinds, KindSignatures,
    TypeFamilies, PolyKinds, FlexibleInstances,
    PatternSynonyms
#-}

Step one: define type-level versions of the tags.

data TagType = EmptyTag | SingleTag | PairTag | LotsTag

Step two: define value-level witnesses for the representability of the type-level tags. Richard Eisenberg's Singletons library will do this for you. I mean something like this:

data Tag :: TagType -> * where
  EmptyT   :: Tag EmptyTag
  SingleT  :: Tag SingleTag
  PairT    :: Tag PairTag
  LotsT    :: Tag LotsTag

And now we can say what stuff we expect to find associated with a given tag.

type family Stuff (t :: TagType) :: * where
  Stuff EmptyTag   = ()
  Stuff SingleTag  = Int
  Stuff PairTag    = (Int, Int)
  Stuff LotsTag    = [Int]

So we can refactor the type you first thought of

data NumCol :: * where
  (:&) :: Tag t -> Stuff t -> NumCol

and use PatternSynonyms to recover the behaviour you had in mind:

pattern Empty        = EmptyT   :&  ()
pattern Single  i    = SingleT  :&  i
pattern Pair    i j  = PairT    :&  (i, j)
pattern Lots    is   = LotsT    :&  is

So what's happened is that each constructor for NumCol has turned into a tag indexed by the kind of tag it's for. That is, constructor tags now live separately from the rest of the data, synchronized by a common index which ensures that the stuff associated with a tag matches the tag itself.

But we can talk about tags alone.

data Ex :: (k -> *) -> * where  -- wish I could say newtype here
  Witness :: p x -> Ex p

Now, Ex Tag, is the type of "runtime tags with a type level counterpart". It has an Eq instance

instance Eq (Ex Tag) where
  Witness EmptyT   ==  Witness EmptyT   = True
  Witness SingleT  ==  Witness SingleT  = True
  Witness PairT    ==  Witness PairT    = True
  Witness LotsT    ==  Witness LotsT    = True
  _                ==  _                = False

Moreover, we can easily extract the tag of a NumCol.

numColTag :: NumCol -> Ex Tag
numColTag (n :& _) = Witness n

And that allows us to match your specification.

filter ((Witness PairT ==) . numColTag) :: [NumCol] -> [NumCol]

Which raises the question of whether your specification is actually what you need. The point is that detecting a tag entitles you an expectation of that tag's stuff. The output type [NumCol] doesn't do justice to the fact that you know you have just the pairs.

How might you tighten the type of your function and still deliver it?

8
votes

One approach is to use DataTypeable and the Data.Data module. This approach relies on two autogenerated typeclass instances that carry metadata about the type for you: Typeable and Data. You can derive them with {-# LANGUAGE DeriveDataTypeable #-}:

data NumCol = Empty |
          Single Int |
          Pair Int Int |
          Lots [Int] deriving (Typeable, Data)

Now we have a toConstr function which, given a value, gives us a representation of its constructor:

toConstr :: Data a => a -> Constr

This makes it easy to compare two terms just by their constructors. The only remaining problem is that we need a value to compare against when we define our predicate! We can always just create a dummy value with undefined, but that's a bit ugly:

is_pair x = toConstr x == toConstr (Pair undefined undefined)

So the final thing we'll do is define a handy little class that automates this. The basic idea is to call toConstr on non-function values and recurse on any functions by first passing in undefined.

class Constrable a where
  constr :: a -> Constr

instance Data a => Constrable a where
  constr = toConstr

instance Constrable a => Constrable (b -> a) where
  constr f = constr (f undefined)

This relies on FlexibleInstance, OverlappingInstances and UndecidableInstances, so it might be a bit evil, but, using the (in)famous eyeball theorem, it should be fine. Unless you add more instances or try to use it with something that isn't a constructor. Then it might blow up. Violently. No promises.

Finally, with the evil neatly contained, we can write an "equal by constructor" operator:

(=|=) :: (Data a, Constrable b) => a -> b -> Bool
e =|= c = toConstr e == constr c

(The =|= operator is a bit of a mnemonic, because constructors are syntactically defined with a |.)

Now you can write almost exactly what you wanted!

filter (=|= Pair)

Also, maybe you'd want to turn off the monomorphism restriction. In fact, here's the list of extensions I enabled that you can just use:

{-# LANGUAGE DeriveDataTypeable, FlexibleInstances, NoMonomorphismRestriction, OverlappingInstances, UndecidableInstances #-}

Yeah, it's a lot. But that's what I'm willing to sacrifice for the cause. Of not writing extra undefineds.

Honestly, if you don't mind relying on lens (but boy is that dependency a doozy), you should just go with the prism approach. The only thing to recommend mine is that you get to use the amusingly named Data.Data.Data class.