24
votes

In my project I have created a data type, that can hold one of a few types of values:

data PhpValue = VoidValue | IntValue Integer | BoolValue Bool

What I wanted to do now, is to have a simple way of checking if two values of the PhpValue type are of the same constructor (correct me if I'm confused with the terminology here, but basically what I want to check if both are, for example, are IntValue, without caring about the particular value).

Here is a function I wrote for that:

sameConstructor :: PhpValue -> PhpValue -> Bool
sameConstructor VoidValue VoidValue = True
sameConstructor (IntValue _) (IntValue _) = True
sameConstructor (BoolValue _) (BoolValue _) = True
sameConstructor _ _ = False

This works as it should, but I don't really like it: if I add more constructors (like FloatValue Float) I am going to have to rewrite the function, and it will get bigger as my data definition gets bigger.

The Question: Is there a way of writing such a function, so that its implementation doesn't change when I add more constructors?

For the record: I don't want to change the data definition, I have enough Monads in the rest of my code as it is ;)

6
You should replace arguments you never use with _. So sameConstructor sth els = False can better be written as sameCOnstructor _ _ = False and so on. This makes the fact that you're not going to use those values clearer.Tikhon Jelvis
You can replace (IntValue a) and others with (IntValue _) as well.sdcvvc

6 Answers

22
votes

Take a look at Data.Data and its toConstr function. This returns a representation of the constructor which can be compared for equality.

With an extension (you can put {-# LANGUAGE DeriveDataTypeable #-} at the top of your module), you can have a Data instance derived for you automatically:

data PhpValue = VoidValue | IntValue Integer | BoolValue Bool 
              deriving (Typeable, Data)

You should then be able to use the toConstr function to compare by constructor.

Now the following will be true:

toConstr (BoolValue True) == toConstr (BoolValue False)

Using on from Data.Function you can now rewrite sameConstructor to:

sameConstructor = (==) `on` toConstr

This is the same as

sameConstructor l r = toConstr l == toConstr r

I think the version using on is easier to read at a glance.

5
votes

This is known as the expression problem in Haskell and ML-family languages; there are a number of unsatisfactory solutions (including using Data.Typeable and abusing typeclasses, in Haskell) but no nice solutions.

2
votes

Since the definition follows a regular format, you can use Template Haskell to automatically derive such a function for any datatype. I went ahead and wrote a simple package for this since I wasn't fully satisfied with the existing solutions.

First, we define a class

class EqC a where
    eqConstr :: a -> a -> Bool
    default eqConstr :: Data a => a -> a -> Bool
    eqConstr = (==) `on` toConstr

and then a function deriveEqC :: Name -> DecsQ which will automatically generate instances for us.

The default is a default signature, and means that when the type is an instance of Data we can omit the definition of eqConstr, and fall back to Tikhon's implementation.

The benefit of Template Haskell is that it produces a more efficient function. We can write $(deriveEqC ''PhpValue) and get an instance that is exactly what we'd write by hand. Take a look at the generated core:

$fEqCPhpValue_$ceqConstr =
  \ ds ds1 ->
    case ds of _ { 
      VoidValue ->
        case ds1 of _ { 
          __DEFAULT -> False;
          VoidValue -> True
        };  
      IntValue ds2 ->
        case ds1 of _ { 
          __DEFAULT -> False;
          IntValue ds3 -> True
        };  
      BoolValue ds2 ->
        case ds1 of _ { 
          __DEFAULT -> False;
          BoolValue ds3 -> True
        }   
    }  

In contrast, using Data introduces a good deal of extra indirection by reifying an explicit Constr for each argument before comparing them for equality:

eqConstrDefault =
  \ @ a $dData eta eta1 ->
    let {
      f
      f = toConstr $dData } in
    case f eta of _ { Constr ds ds1 ds2 ds3 ds4 ->
    case f eta1 of _ { Constr ds5 ds6 ds7 ds8 ds9 ->
    $fEqConstr_$c==1 ds ds5
    }
    }

(There's a lot of other bloat involved in computing toConstr that's not worth showing)

In practice this leads to the Template Haskell implementation being about twice as fast:

benchmarking EqC/TH
time                 6.906 ns   (6.896 ns .. 6.915 ns)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 6.903 ns   (6.891 ns .. 6.919 ns)
std dev              45.20 ps   (32.80 ps .. 63.00 ps)

benchmarking EqC/Data
time                 14.80 ns   (14.77 ns .. 14.82 ns)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 14.79 ns   (14.77 ns .. 14.81 ns)
std dev              60.17 ps   (43.12 ps .. 93.73 ps)
1
votes

One popular alternative to Data is Generic. I think Data probably makes more sense in this context, but I figured it would make sense to add this just for completeness.

{-# LANGUAGE DefaultSignatures, TypeOperators, FlexibleContexts #-}
module SameConstr where

import GHC.Generics
import Data.Function (on)

class EqC a where
    eqConstr :: a -> a -> Bool
    default eqConstr :: (Generic a, GEqC (Rep a)) => a -> a -> Bool
    eqConstr = geqConstr `on` from

class GEqC f where
  geqConstr :: f p -> f p -> Bool
  {-# INLINE geqConstr #-}
  geqConstr _ _ = True

instance GEqC f => GEqC (M1 i c f) where
  {-# INLINE geqConstr #-}
  geqConstr (M1 x) (M1 y) = geqConstr x y

instance GEqC (K1 i c)
instance GEqC (f :*: g)
instance GEqC U1
instance GEqC V1

instance (GEqC f, GEqC g) => GEqC (f :+: g) where
  {-# INLINE geqConstr #-}
  geqConstr (L1 x) (L1 y) = geqConstr x y
  geqConstr (R1 x) (R1 y) = geqConstr x y
  geqConstr _ _ = False
0
votes

In your special case you can use the Show magic of the compiler:

data PhpValue = VoidValue | IntValue Integer | BoolValue Bool deriving Show

sameConstructor v1 v2 = cs v1 == cs v2 where 
   cs = takeWhile (/= ' ') . show

Of course depending on the string representation generated by the compiler is very close to a hack...

0
votes

If you don't want to use any of the reasonable ways in the other answers, you can use a completely unsupported way that is guaranteed to be fast but not actually guaranteed to give correct results or even not to crash. Note that this will even be happy to try to compare functions, for which it will give utterly bogus results.

{-# language MagicHash, BangPatterns #-}

module DangerZone where

import GHC.Exts (Int (..), dataToTag#)
import Data.Function (on)

{-# INLINE getTag #-}
getTag :: a -> Int
getTag !a = I# (dataToTag a)

sameConstr :: a -> a -> Bool
sameConstr = (==) `on` getTag

One other problem (arguably) is that this peers through newtypes. So if you have

newtype Foo a = Foo (Maybe a)

then

sameConstr (Foo (Just 3)) (Foo Nothing) == False

even though they're built with the Foo constructor. You can work around this by using a bit of the machinery in GHC.Generics, but without the runtime cost associated with using unoptimized generics. This gets pretty hairy!

{-# language MagicHash, BangPatterns, TypeFamilies, DataKinds,
             ScopedTypeVariables, DefaultSignatures #-}

import Data.Proxy (Proxy (..))
import GHC.Generics
import Data.Function (on)
import GHC.Exts (Int (..), dataToTag#)

--Define getTag as above

class EqC a where
  eqConstr :: a -> a -> Bool
  default eqConstr :: forall i q r s nt f.
                      ( Generic a
                      , Rep a ~ M1 i ('MetaData q r s nt) f
                      , GNT nt)
                   => a -> a -> Bool
  eqConstr = genEqConstr

-- This is separated out to work around a bug in GHC 8.0
genEqConstr :: forall a i q r s nt f.
                      ( Generic a
                      , Rep a ~ M1 i ('MetaData q r s nt) f
                      , GNT nt)
                   => a -> a -> Bool
genEqConstr = (==) `on` modGetTag (Proxy :: Proxy nt)

class GNT (x :: Bool) where
  modGetTag :: proxy x -> a -> Int

instance GNT 'True where
  modGetTag _ _ = 0

instance GNT 'False where
  modGetTag _ a = getTag a

The key idea here is that we look at the type-level metadata associated with the generic representation of the type to determine whether or not it's a newtype. If it is, we report its "tag" as 0; otherwise we use its actual tag.