Is there a name for ADT with explicit subtyping?

Question

I'm looking for a proper name for a data type that combines ADT with explicit subtyping.

In one of my applications, I use a structure similar to ADT to represent parse trees, on which I perform recursive pattern matching. I find it rather convenient if I could combine ADT with subtyping, as demonstrated in the example below:

Note: the example is written in Haskell's syntax, but this is not Haskell code.

data Empty = Empty
data Expr = Int Int | Add Expr AddOp Expr
data OptionalExpr =
| Empty  // I want to make Empty a subtype of OptionalExpr
| Expr   // I want to make Expr  a subtype of OptionalExpr

In the example above, I first define 2 types: Empty and Expr. Then I make these 2 types the subtype of OptionalExpr. I realize this kind of data type is uncommon. Apparently neither Haskell nor OCaml support it. But I don't know about other functional languages.

I am looking for something that combines ADT with explicit subtyping, not structurally-implied subtyping as in polymorphic variant. There are a few justifications for this requirement:

First, we want all-or-none subtyping. Say we want A to be a subtype of B, then we will never want to include only some of the variants of A under B. Either A is a subtype of B, in which case B includes all the variants of A, or A is not a subtype of B, in which case B includes none of the variants of A. We don't allow gray-area in between.
Second, we don't want B to be open in any sense. We have in mind a very specific set of subtypes of B. We don't want something to become an instance of B just by implementing a typeclass or the like.
Third, say type A has a large number of variants. We want to make type B a supertype of A. Copying all the variants into B, as is required with polymorphic variant, is just too cumbersome and error-prone.
Fourth, we don't want to introduce new value-constructors when all we want to express is a subtype. In the example above, we could have written OptionalExpr as an ADT with 2 value-constructors, like this: data OptionalExpr = EmptyExpr | NonEmptyExpr Expr, or we could have used Maybe, but in my application this is unacceptable, because the level of embedding can be quite deep, and it would be a nightmare to deconstruct an deeply embedded value like (L1 (L2 (L3 (L4 (L5 value_wanted))))).

To give you some idea why such requirements exist, I show a more specific example below:

PrimaryExpr = ID | LeftParen Expr RightParen
UnaryExpr = PrimaryExpr | NegateOp PrimaryExpr    // -
MultExpr  = UnaryExpr | MultExpr MultOp UnaryExpr // *
AddExpr   = MultExpr  | AddExpr  AddOp MultExpr   // +
CompExpr  = AddExpr | AddExpr CompOp AddExpr
Expr = CompExpr

The above example expresses a subtype hierarchy, and expresses ideas such as AddExpr is a CompExpr, but a CompExpr is not an AddExpr. For this specific example, some people have suggested to me that I can replace UnaryExpr, MultExpr, AddExpr and so on with just Expr. That is, I can define all the types as a single type. That loses type constraints such as CompExpr is not AddExpr, and because I'm doing recursive pattern matching on these types, I need that constraints of this hierarchy to be statically enforced.

Is there a name for this kind of data type I'm looking for in the literature? Or am I looking for something that doesn't even make sense? If you think this is the case, why am I looking for something nonsensical? Thanks for any pointers.

EDIT: even though I've written the above code snippets in Haskell's syntax, I am not writing my application in Haskell. I'm using my own language and my own data types, so I am not limited by Haskell's semantics. I am looking for a pointer to similar concepts in literature, so that when I write a report for my project I don't appear to be reinventing something new. I tried all the google keywords I can think of and nothing right was returned so I'm asking here.

It's not clear if you realize Haskell won't let you do this. — Louis Wasserman
Agreed. I'm tempted to close as off topic since the answer is "no" and subjectively "yes" to the two questions. — Thomas M. DuBuisson
@ThomasM.DuBuisson When you say the answer is no, do you mean no one has ever attempted something similar? If that is the case, is there any reason as to why this approach is nonsensical? — neuron
I assume you already know about Generalized algebraic datatypes (GADTs)? — mb21

Daniel Wagner Daniel Wagner · Accepted Answer · 2016-04-17T18:34:12

In a comment, you say:

I'm not sure how to encode a subtype hierarchy using GADTs. If you think it is doable, would you mind providing an answer with an example as to how the type hierarchy given in my example may be encoded?

Therefore I give an answer to this question here. The key idea is to give a type-level function (in the host language, here Haskell) for computing the subtyping relation (of the target language's type system, here your custom EDSL). For simplicity, I will spell out the subtyping relation in full, but standard type-level programming can be used to reduce the repetition and raise the abstraction level as appropriate. First, the extensions needed:

{-# LANGUAGE GADTs #-}
{-# LANGUAGE DataKinds #-}
{-# LANGUAGE TypeFamilies #-}

Now the definition of the subtyping relation:

data Level = Primary | Unary | Mult | Add | Comp

type family Subtype a b where
    Subtype Primary a       = True
    Subtype Unary   Primary = False
    Subtype Unary   a       = True
    Subtype Mult    Primary = False
    Subtype Mult    Unary   = False
    Subtype Mult    a       = True
    Subtype Add     Add     = True
    Subtype Add     Comp    = True
    Subtype Add     a       = False
    Subtype Comp    Comp    = True
    Subtype Comp    a       = False

A closed type family is used to guarantee that the subtyping relation cannot be expanded by clients (your second property). Finally, the GADT for target language terms can use the subtyping relation as a constraint on its constructors.

data Expr a where
    ID      :: Subtype Primary a ~ True => Expr a
    Paren   :: Subtype Primary a ~ True => Expr b -> Expr a
    Negate  :: Subtype Unary   a ~ True => Expr Unary -> Expr a
    Times   :: Subtype Add     a ~ True => Expr Mult -> Expr Mult -> Expr a
    Plus    :: Subtype Add     a ~ True => Expr Add -> Expr Add -> Expr a
    Compose :: Subtype Comp    a ~ True => Expr Comp -> Expr Comp -> Expr a

Note that because the argument to Paren is polymorphic, you will need a type annotation on the contained term to express which "level" of the subtyping hierarchy you want that term to be treated as. I would expect you would need to do this in whatever language you are designing as well. In ghci, we can ask for the type of a sample term:

:t Compose (Times ID ID) (Negate (Paren (Plus ID ID :: Expr Add)))
Compose (Times ID ID) (Negate (Paren (Plus ID ID :: Expr Add)))
  :: (Subtype 'Comp a ~ 'True) => Expr a

This is more or less the type you would expect for this term, I think. You can also see that the expression hierarchy is strictly enforced, though I dare say the error message is not 100% clear (since it is written in host language terms and not target language terms):

:t Negate (Plus ID ID)

<interactive>:1:9:
    Couldn't match type ‘'False’ with ‘'True’
    Expected type: 'True
      Actual type: Subtype 'Add 'Unary
    In the first argument of ‘Negate’, namely ‘(Plus ID ID)’
    In the expression: Negate (Plus ID ID)

Is there a name for ADT with explicit subtyping?

4 Answers