1
votes

(This is a follow-up to this answer, trying to get the q more precise.)

Use Case Constructors to build/access a Set datatype. Being a set, the invariant is 'no duplicates'. To implement that I need an Eq constraint on the element type. (More realistically, the set might be implemented as a BST or hash-index, which'll need a more restrictive constraint; using Eq here to keep it simple.)

  • I want to disallow building even an empty set with an unacceptable type.
  • " it's now considered bad practice to require a constraint for an operation (data type construction or destruction) that does not need the constraint. Instead, the constraints should be moved closer to the "usage site".", to quote that answer.
  • OK so there's no need to get the constraint 'built into' the data structure. And operations that access/deconstruct (like showing or counting elements) won't necessarily need Eq.

Then consider two (or rather five) possible designs:

(I'm aware some of the constraints could be achieved via deriving, esp Foldable to get elem. But I'll hand-code here, so I can see what minimal constraints GHC wants.)

Option 1: No constraints on datatype

data NoCSet a  where                             -- no constraints on the datatype
  NilSet_ :: NoCSet a
  ConsSet_ :: a -> NoCSet a -> NoCSet a

Option 1a. use PatternSynonym as 'smart constructor'

    pattern NilSet :: (Eq a) => () => NoCSet a
    pattern NilSet  = NilSet_

    pattern ConsSet x xs <- ConsSet_ x xs  where
      ConsSet x xs | not (elemS x xs) = ConsSet_ x xs 

    elemS x NilSet_                     = False
    elemS x (ConsSet_ y ys) | x == y    = True        -- infers (Eq a) for elemS
                            | otherwise = elemS x ys
  • GHC infers the constraint elemS :: Eq t => t -> NoCSet t -> Bool. But it doesn't infer a constraint for ConsSet that uses it. Rather, it rejects that definition:

    * No instance for (Eq a) arising from a use of `elemS'
      Possible fix:
        add (Eq a) to the "required" context of
          the signature for pattern synonym `ConsSet'
    
  • Ok I'll do that, with an explicitly empty 'Provided' constraint:

    pattern ConsSet :: (Eq a) => () => ConsType a    -- Req => Prov'd => type; Prov'd is empty, so omittable
    
  • Consequently inferred type (\(ConsSet x xs) -> x) :: Eq a => NoCSet a -> a, so the constraint 'escapes' from the destructor (also from elemS), whether or not I need it at the "usage site".

Option 1b. Pattern synonym wrapping a GADT constructor as 'smart constructor'

    data CSet a where CSet :: Eq a => NoCSet a -> CSet a   -- from comments to the earlier q

    pattern NilSetC = CSet NilSet_                         -- inferred Eq a provided
    pattern ConsSetC x xs <- CSet (ConsSet_ x xs)  where   -- also inferred Eq a provided
      ConsSetC x xs | not (elemS x xs) = CSet (ConsSet_ x xs)
  • GHC doesn't complain about the lack of signature, does infer pattern ConsSetC :: () => Eq a => a -> NoCSet a -> CSet a a Provided constraint, but empty Required.

  • Inferred (\(ConsSetC x xs) -> x) :: CSet p -> p, so the constraint doesn't escape from a "usage site".

  • But there's a bug: to Cons an element, I need to unwrap the NoCSet inside the CSet in the tail then re-wrap a CSet. And trying to do that with the ConsSetC pattern alone is ill typed. Instead:

    insertCSet x (CSet xs) = ConsSetC x xs    -- ConsSetC on rhs, to check for duplicates
    
  • As 'smart constructors' go, that's dumb. What am I doing wrong?

  • Inferred insertCSet :: a -> CSet a -> CSet a, so again the constraint doesn't escape.

Option 1c. Pattern synonym wrapping a GADT constructor as 'smarter constructor'

  • Same setup as option 1b, except this monster as ViewPattern for the Cons pattern

    pattern ConsSetC2 x xs <- ((\(CSet (ConsSet_ x' xs')) -> (x', CSet xs')) -> (x, xs))  where
      ConsSetC2 x (CSet xs) | not (elemS x xs) = CSet (ConsSet_ x xs)
    
  • GHC doesn't complain about the lack of signature, does infer pattern ConsSetC2 :: a -> CSet a -> CSet a with no constraint at all. I'm nervous. But it does correctly reject attempts to build a set with duplicates.

  • Inferred (\(ConsSetC2 x xs) -> x) :: CSet a -> a, so the constraint that isn't there doesn't escape from a "usage site".

  • Edit: ah, I can get a somewhat less monstrous ViewPattern expression to work

    pattern ConsSetC3 x xs <- (CSet (ConsSet_ x (CSet -> xs)))  where
      ConsSetC3 x (CSet xs) | not (elemS x xs) = CSet (ConsSet_ x xs)
    
  • Curiously inferred pattern ConsSetC3 :: () => Eq a => a -> CSet a -> CSet a -- so the Provided constraint is visible, unlike with ConsSetC2, even though they're morally equivalent. It does reject attempts to build a set with duplicates.

  • Inferred (\(ConsSetC3 x xs) -> x) :: CSet p -> p, so that constraint that is there doesn't excape from "usage sites".

Option 2: GADT constraints on datatype

data GADTSet a  where
  GADTNilSet  :: Eq a =>                   GADTSet a
  GADTConsSet :: Eq a => a -> GADTSet a -> GADTSet a

elemG x GADTNilSet                     = False
elemG x (GADTConsSet y ys) | x == y    = True              -- no (Eq a) 'escapes'
                           | otherwise = elemG x ys

GHC infers no visible constraint elemG :: a -> GADTSet a -> Bool; (\(GADTConsSet x xs) -> x) :: GADTSet p -> p.

Option 2a. use PatternSynonym as 'smart constructor' for the GADT

    pattern ConsSetG x xs <- GADTConsSet x xs  where
      ConsSetG x xs | not (elemG x xs) = GADTConsSet x xs -- does infer Provided (Eq a) for ConsSetG
  • GHC doesn't complain about the lack of signature, does infer pattern ConsSetG :: () => Eq a => a -> GADTSet a -> GADTSet a a Provided constraint, but empty Required.
  • Inferred (\(ConsSetG x xs) -> x) :: GADTSet p -> p, so the constraint doesn't escape from a "usage site".

Option 2b. define an insert function

    insertGADTSet x xs | not (elemG x xs) = GADTConsSet x xs   -- (Eq a) inferred
  • GHC infers insertGADTSet :: Eq a => a -> GADTSet a -> GADTSet a; so the Eq has escaped, even though it doesn't escape from elemG.

Questions

  • With insertGADTSet, why does the constraint escape? It's only needed for the elemG check, but elemG's type doesn't expose the constraint.
  • With constructors GADTConsSet, GADTNilSet, there's a constraint wrapped 'all the way down' the data structure. Does that mean the data structure has a bigger memory footprint than with ConsSet_, NilSet_?
  • With constructors GADTConsSet, GADTNilSet, it's the same type a 'all the way down'. Is the same Eq a dictionary repeated at each node? Or shared?
  • By comparison, pattern ConsSetC/constructor CSet/Option 1b wraps only a single dictionary(?), so it'll have a smaller memory footprint than a GADTSet structure(?)
  • insertCSet has a performance hit of unwrapping and wrapping CSets?
  • ConsSetC2 in the build direction seems to work; there's a performance hit in unwrapping and wrapping CSets? But worse there's a unwrapping/wrapping performance hit in accessing/walking the nodes?

(I'm feeling there's no slam-dunk winner amongst these options for my use case.)

2

2 Answers

2
votes

I don't think there is any realistic scenario where it is important that you disallow the creation of an empty set with an "unacceptable" type. It is at least partly because of lack of such realistic scenarios that DatatypeContexts is considered bad practice. Seriously, try to imagine how such a restriction could possibly help avoid real-world programming errors. That is, try to imagine how someone using your types and functions might (1) write an erroneous program that (2) "accidentally" uses a set of, say, functions and yet (3) somehow gets it to type check in a manner that (4) could have been caught if only there'd been an extra Eq constraint on NilSet. As soon as a programmer tries to do anything with that set that makes it non-empty (i.e., anything that needs Eq functionality), it won't type check, so what exactly are you trying to prevent? You want to stop someone who only needs empty sets of functions from using your types? Why? Is it spite? ... It is spite, isn't it?

Getting down to your various options, putting the constraint in a GADT is inappropriate and unnecessary to my mind. The point of constraints in GADTs is to allow destruction to dynamically and/or conditionally bring an instance dictionary into scope, based on a runtime case match. You do not need this functionality. In particular, you do not need the overhead of a dictionary in every one of your Cons nodes as per option 2. However, you also don't need a dictionary in the data type as per option 1(b). It's better to use the normal non-GADT mechanisms of passing dictionaries to functions instead of carrying them around in the data types. I expect you'll be missing many opportunities for specialization and optimization if you try option 1(b). Partly this may be because GADTs are intrinsically harder to optimize, but there's also much less work that's been put into optimizing code using GADTs than code using non-GADTs. Some of your questions suggest you're very concerned about small performance gains. If so, it's generally a good idea to stay well away from GADTs!

Option 1(a) is a reasonable solution. Without the unnecessary Eq constraint on Nil, and folding the insert function into the pattern definition, you get something like:

{-# LANGUAGE PatternSynonyms #-}

data Set a = Nil | Cons_ a (Set a) deriving (Show)

pattern Cons :: (Eq a) => a -> Set a -> Set a
pattern Cons x xs <- Cons_ x xs
  where Cons x xs = go xs
          where go Nil = Cons_ x xs
                go (Cons_ y ys) | x == y = xs
                                | otherwise = go ys

which seems like a perfectly idiomatic, straightforward, smart constructor implementation using patterns, as designed.

Indeed, it's unfortunately that the constraint applies to destruction, when it isn't really needed. Ideally, GHC would allow constraints for use of Cons as a constructor to be specified separately, instead of assuming they're the combination of the required and provided constraints for destruction.

This would allow us to write something like:

pattern Cons :: a -> Set a -> Set a
pattern Cons x xs <- Cons_ x xs
  where Cons :: Eq a => a -> Set a -> Set a  -- <== only need Eq here
        Cons x Nil = Cons_ x Nil
        Cons x rest@(Cons_ y xs) | x == y = rest
                                 | otherwise = Cons_ y (Cons x xs)

and then usages of Cons as a destructor could be constraint-free while uses as a constructor could take advantage of the Eq a constraint. I see this as a limitation of PatternSynonyms rather than an indication that adding unnecessary constraints a la DatatypeContexts is actually good programming practice. It looks like at least a few other people agree that this is a bug not a feature.

To your first four questions:

  • In option 2(b), insertGADTSet needs an Eq a dictionary to insert into the dictionary slot in the GADTConsSet constructor on the RHS. So, the Eq a constraint comes from the use of the GADTConsSet constructor.
  • Yes, GADT constraints become additional fields in the data type. In option 2, a pointer to the Eq a dictionary is added to every node in your set.
  • The dictionary itself is shared, but each node includes its own pointer to the dictionary.
  • Yes, for the CSet type, there's only one pointer to the dictionary per CSet value.
-1
votes

I believe that option 1b is your best bet. Of course, I may be biased, as I'm the one which suggested it on your other question.

To address the issue you've pointed out with your pattern synonym, let us imagine we don't have pattern synonyms. How might we deconstruct a Cons set?

One way is to write a method with this signature:

openSetC :: CSet a -> (Eq a => a -> CSet a -> r) -> (Eq a => r) -> r

Which says that given:

  • a CSet a,
  • a function which takes: a proof that Eq a, an a, and another CSet a, and returns some arbitrary type,
  • and a function which takes a proof that Eq a and returns the same arbitrary type,

we can produce a value of that arbitrary type. Since the type is arbitrary, we know that the values comes from calling the function, or from the given value of that type. The contract of this function is that it invokes the first function and return its result if and only if the set is ConsSet_, otherwise, if it is NilSet_, it invokes the second function.

If you squint a little, you can see this function is in a sense "equivalent" to pattern matching on CSet. You don't need pattern matching at all anymore; you can do everything with this function that you can do with pattern matching.

It's implementation is quite trivial:

openSetC (CSet (ConsSet_ x xs)) k _ = k x (CSet xs)
openSetC (CSet NilSet_) _ z = z

Consider now a different form of this function, which accomplishes all the same things, but is maybe a bit easier to use.

Note that the type forall r . (a -> r) -> (b -> r) -> r is isomorphic to Either a b. Also note that x0 -> y0 -> r is isomorphic (or close enough) to (x0, y0) -> r. And finally note that C a => r is isomorphic to Dict (C a) -> r where:

data Dict c where Dict :: c => Dict c

If we exploit these isomorphisms, we can write openSetC differently as:

openSetC' :: CSet a -> Either (a, CSet a, Dict (Eq a)) (Dict (Eq a))
openSetC' (CSet (ConsSet_ x xs)) = Left (x, CSet xs, Dict)
openSetC' (CSet NilSet_) = Right Dict

Now the fun part: using ViewPatterns, we can use this function directly to easily write the pattern with the signature you want. It's easy only because we've set up the type of openSetC' to match with the type of the pattern you want:

pattern ConsSetC :: () => Eq a => a -> CSet a -> CSet a
pattern ConsSetC x xs <- (openSetC' -> Left (x, xs, Dict))

   -- included for completeness, but the expression form of the pattern synonym is not at issue here
   where
     ConsSetC x (CSet xs) | not (elemS x xs) = CSet (ConsSet_ x xs)
                          | otherwise        = CSet xs

As for the rest of your questions, I would strongly suggest splitting them up into different posts so they could all have focused answers.