Extract nested type information from ConstructorInfo of generics-sop

Question

I'm trying to learn the Haskell package generics-sop and in particular I'm using the helper method constructorInfo to get process-able information about a data type.

As an exercise I'm trying to extract information from the ConstructorInfo type into a more understandable type:

module SOPExperiment where

import qualified GHC.Generics as GHC
import Generics.SOP
import Data.Text (Text, pack)

data SimpleTree = Nil
  | Node String SimpleTree SimpleTree
  deriving (Show, GHC.Generic)

instance Generic SimpleTree
instance HasDatatypeInfo SimpleTree

data Exp = Union SimpleTree
  | Intersection SimpleTree
  deriving (Show, GHC.Generic)

instance Generic Exp
instance HasDatatypeInfo Exp

I've been able to piece together this partial solution consNames, that lists constructors of Exp:

-- This will give ["Union","Intersection"]
expConsNames = consNames (Proxy :: Proxy Exp)

-- And this ["Nil","Node"]
treeConsNames = consNames (Proxy :: Proxy SimpleTree)

consNames :: HasDatatypeInfo a => Proxy a -> [Text]
consNames p = hcollapse $ hliftA constructorNames cI :: [Text]
  where
  dI = datatypeInfo p
  cI = constructorInfo dI
  constructorNames :: ConstructorInfo xs -> K Text xs
  constructorNames cInfo =
      let cName = case cInfo of
              Constructor n -> n
              Infix n _ _   -> n
              Record n _    -> n
      in K . pack $ cName

But how to return a list that includes nested constructor names?

-- Should return ["Union","Intersection", "Nil", "Node"]
nestedExpConsNames = undefined

Or slightly better, return a tree data type that represent how constructors are nested in the type?

data Tree = Tree String [Tree] deriving Show
expDataStructure :: Tree
expDataStructure = undefined

{- Result should be something akin to (infinite tree): 
expDataStructure' = Tree "Exp" [
  Tree "Union" [
    Tree "SimpleTree" [Tree "Nil" [], Tree "Node" [Tree "SimpleTree" [...]]],
    Tree "SimpleTree" [Tree "Nil" [], Tree "Node" [Tree "SimpleTree" [...]]]
  ],
  Tree "Intersection" [
    Tree "SimpleTree" [Tree "Nil" [], Tree "Node" [Tree "SimpleTree" [...]]],
    Tree "SimpleTree" [Tree "Nil" [], Tree "Node" [Tree "SimpleTree" [...]]]
  ]]
  -}

Used GHC Extensions are: OverloadedStrings, ScopedTypeVariables, TypeFamilies, RankNTypes, TypeOperators, ConstraintKinds, MultiParamTypeClasses, TypeSynonymInstances, FlexibleInstances, FlexibleContexts, DefaultSignatures, DataKinds, DeriveGeneric, ExtendedDefaultRules

(All extensions might not be needed)

The problem here is that sub-types of a given constructor are not guaranteed to also have instances of HasDatatypeInfo, so you can't necessarily traverse them in the same way. You can get access to the sub-types themselves via the xs list in ConstructorInfo xs, and you can use para_SList to fold over them, but you can't query whether each type has a HasDatatypeInfo instance as you do that. This is a pretty standard problem when trying to do dependent type-ish things in Haskell. There is no real RTTI. It may sometimes almost look like there is, but it's all smoke and mirrors. — Fyodor Soikin
Hmm, all the necessary type information should be available at compile time right? I'm thinking I should be able to constrain with All or All2 constraints to HasDatatypeInfo types. — worldsayshi
ConstructorInfo is at the term level, that's where types stop flowing. You can't require that the types wrapped inside ConstructorInfo have any constraints which are not already part of ConstructorInfo definition. And HasDatatypeInfo cannot be part of it, because then it would be impossible to construct ConstructorInfo for constructors that wrap any HasDatatypeInfo-lacking types. — Fyodor Soikin
I'm puzzled by how HasDatatypeInfo is different to classes like Show in this regard. Deriving Show only works when all nested types are also in Show. Can I somehow make a class HasDatatypeInfo2 that enforce my instances to implement HasDatatypeInfo "all the way down"? — worldsayshi
There's a lot here. First, the difference is that the compiler can automatically generate (i.e "derive") instances of Show for you. If you ask the compiler to do so, it will indeed require that all included types already have a Show instance. But only immediately included ones, not all the way down. If, on the other hand, you don't ask the compiler to derive Show instances for you, but implement them manually instead, then there is no requirement of "all the way down". The class itself does not have this requirement, only the compiler's derivation mechanism. — Fyodor Soikin

kosmikus kosmikus · Accepted Answer · 2019-11-21T19:56:02

A variant of the non-recursive function is required to define the recursive version, so let's start with that:

conNames_NP :: (Generic a, HasDatatypeInfo a) => Proxy a -> NP (K Text) (Code a)
conNames_NP p =
  map_NP
    (K . pack . constructorName)
    (constructorInfo (datatypeInfo p))

It's mostly what you wrote, but the case distinction is not needed, because constructorName is provided by the library.

Now let's define the recursive tree-based version, which will in general return an infinite tree. Also, in your example, datatype names suddenly appear in the tree structure. The semantics of that is unclear to me, so I'll produce a tree with only constructor names.

data Tree = Tree Text [Tree]

You need an additional class for all datatypes supporting this operation. This is because, as already brought up in the comments, you cannot (or better, don't want to) assume that every type recursively is an instance of Generic and HasDatatypeInfo. Primitive types such as Char or Text are not instances of these classes, yet they will probably occur in your datatypes somewhere.

So we define

class ConNamesRecursive a where
  conNamesRecursive :: Proxy a -> [Tree]
  default conNamesRecursive ::
    (Generic a, HasDatatypeInfo a, All2 ConNamesRecursive (Code a))
    => Proxy a -> [Tree]
  conNamesRecursive = gconNamesRecursive

The proxy is a matter of taste, as I'll be using type applications in the code below. I like to have proxies around as a reminder that a type argument will be needed.

The definition of gconNamesRecursive is only tricky because constructor names are not intrinsically tied to the type they are a constructor from, so we have to somehow pass the type info correctly. I do this by introducing a number of helper functions:

gconNamesRecursive ::
  forall a . (Generic a, HasDatatypeInfo a, All2 ConNamesRecursive (Code a))
  => Proxy a -> [Tree]
gconNamesRecursive p =
  collapse_NP (cmap_NP (Proxy @(All ConNamesRecursive)) go (conNames_NP p))
  where
    go :: forall xs . All ConNamesRecursive xs => K Text xs -> K Tree xs
    go = mapKK (\ cn -> Tree cn (concat (collapse_NP (conNamesRecursive_NP @xs))))

The generic definition starts from conNames_NP which is an n-ary product containing each constructor name for the top-level type, i.e., a NP (K Text) (Code a). If we map over this (using cmap_NP), we have to say what to do with the code corresponding to each constructor. This is handled by go. That function turns the name into the root of a Tree, and the children are given by calling conNamesRecursive_NP, which is defined as follows:

conNamesRecursive_NP :: forall xs . All ConNamesRecursive xs => NP (K [Tree]) xs
conNamesRecursive_NP =
  cpure_NP (Proxy @ConNamesRecursive) go
  where
    go :: forall a . ConNamesRecursive a => K [Tree] a
    go = K (conNamesRecursive (Proxy @a))

This is a function just calling conNamesRecursive again per component of the constructor. Because this function is a member of the type class, we have to require in the beginning that all components occurring in the code of the original datatype are themselves members of this class.

Before we can use this on concrete example types, we have to make them (and all types that occur somewhere inside) an instance of ConNamesRecursive.

We can do this by just using DeriveAnyClass:

data SimpleTree =
    Nil
  | Node Text SimpleTree SimpleTree
  deriving (Show, GHC.Generic, Generic, HasDatatypeInfo, ConNamesRecursive)

data Exp =
    Union SimpleTree
  | Intersection SimpleTree
  deriving (Show, GHC.Generic, Generic, HasDatatypeInfo, ConNamesRecursive)

Note that I changed String to Text in Node so that I don't have to decide what to do with lists of characters, but can just treat Text as an abstract type. Abstract types can be made members of the ConNamesRecursive class producing an empty list of trees.

instance ConNamesRecursive Text where
  conNamesRecursive _ = []

Finally, in order to make this a bit easier to visualise, I implemented a (naive) unDup function on trees that stops if a constructor occurs underneath itself:

unDup :: [Tree] -> [Tree]
unDup = go Set.empty
  where
    go :: Set.Set Text -> [Tree] -> [Tree]
    go seen ts =
      map
        (\ (Tree cn children) ->
          Tree cn (if cn `Set.member` seen then [] else go (Set.insert cn seen) children)
        )
        ts

Now we can test:

GHCi> unDup $ conNamesRecursive (Proxy @Exp)
[Tree "Union"
  [Tree "Nil" []
  ,Tree "Node"
    [Tree "Nil" []
    ,Tree "Node" []
    ,Tree "Nil" []
    ,Tree "Node" []
    ]
  ]
  ,Tree "Intersection"
    [Tree "Nil" []
    ,Tree "Node"
      [Tree "Nil" []
      ,Tree "Node" []
      ,Tree "Nil" []
      ,Tree "Node" []
    ]
  ]
]

(Layout added for readability.)

Of course, all sorts of variations are possible to implement here.

As I said before, this function is somewhat trickier to define than most generic functions, because we're neither consuming a value of the type we analyze nor are we producing a value of the type we analyze, but we're producing a value of a constant type [Tree], so there's little type information for GHC to go on, which means more type parameters / helper functions with type signatures than usual.

I hope this nevertheless gives you an idea.

Extract nested type information from ConstructorInfo of generics-sop

1 Answers