2
votes

I'm writing a domain-specific language in Haskell, and have settled on a design with two ASTs: an initial untyped one that represents syntax and a final typed one that represents everything. I'm writing the final one as a GADT to get better typechecking.

I think it's almost working, but I'm having trouble writing a function that converts initial -> final (checks the types, plus some other things not shown, like all references correspond to a variable).

Here's a simplified example:

{-# LANGUAGE GADTs, StandaloneDeriving #-}

module Main where

-- untyped initial AST

data Untyped
  = UNum Int
  | UStr String
  | UAdd Untyped Untyped
  deriving (Show, Eq)

-- typed final AST

data Typed a where
  TNum :: Int    -> Typed Int
  TStr :: String -> Typed String
  TAdd :: Typed Int -> Typed Int -> Typed Int

deriving instance Eq   (Typed a)
deriving instance Show (Typed a)

-- wrapper that allows working with a `Typed a` for any `a`
data TypedExpr where
  TypedExpr :: Typed a -> TypedExpr

And this is my attempt at the check function. The base cases are simple:

check :: Untyped -> Either String TypedExpr
check (UNum n) = Right $ TypedExpr $ TNum n
check (UStr s) = Right $ TypedExpr $ TStr s
-- check (Uadd e1 e2) = ???

But how do I add Add? It can recursively evaluate the sub-expressions to values of type Either String (TypedExpr (Typed a)), but I haven't managed to unwrap those, check that the types line up (both as should be Ints), and re-wrap afterward. I planned to do it all with big pattern matches, but GHC doesn't approve:

My brain just exploded
  I can't handle pattern bindings for existential or GADT data constructors.
  Instead, use a case-expression, or do-notation, to unpack the constructor.

That's explained here, but I didn't understand the explanation. It seems I don't want pattern matching.

Update: I must have messed something else up without noticing. Pattern matching works, as shown by Nikita.

So I fiddled around trying to force things into the right shape, but haven't gotten anything substantial yet. If these were just Either String SomeValue I would want to use applicatives, right? Is it possible to add another level of unwrapping + type checking to them? I suspect this answer is close to what I want since the question is very similar, but again I don't understand it. This may also be related.

Update: That first answer is just what I wanted after all. But I couldn't see how until chi wrote the intermediate version below without the extra Type type. Here's a working solution. The trick was to tag TypedExprs with a new type representing just the return type (a) of a Typed a:

data Returns a where
  RNum :: Returns Int
  RStr :: Returns String

-- extend TypedExpr to include the return type
data TypedExpr2 where
  TypedExpr2 :: Returns a -> Typed a -> TypedExpr2

That way check doesn't have to know whether each subexpression is a raw TNum or a function (like Add) that returns a TNum:

check :: Untyped -> Either String TypedExpr2
check (UNum n) = Right $ TypedExpr2 RNum (TNum n)
check (UStr s) = Right $ TypedExpr2 RStr (TStr s)
check (UAdd u1 u2) = do
  -- typecheck subexpressions, then unwrap by pattern matching
  TypedExpr2 r1 t1 <- check u1
  TypedExpr2 r2 t2 <- check u2
  -- check the tags to find out their return types
  case (r1, r2) of
    -- if correct, create an overall expression tagged with its return type
    (RNum, RNum) -> return $ TypedExpr2 RNum $ TAdd t1 t2
    _ -> Left "type error"

GHC is smart enough to know that the two as in any TypedExpr2 must match, so it catches you if you attempt to use the wrong overall return type at the end. Wonderful!

3
How about something like this? See gergo.erdi.hu/blog/… for an extended explanation.Cactus

3 Answers

2
votes

My recommendation would be to use a "plain" representation of your universe of types (with an interpretation type function), and then store a Singleton reification of your existential type in TypedExpr, i.e. something like

{-# LANGUAGE DataKinds, KindSignatures, TypeFamilies #-}

data Type = TInt | TString

type family InterpT (a :: Type) where
  InterpT TInt = Int
  InterpT TString = String

-- plus the usual singletons stuff
-- ...

-- and finally
data Typed (a :: Type) where ...

data TypedExpr where
  TypedExpr :: Sing a -> Typed a -> TypedExpr

This way, you can do something like

check (UAdd e1 e2) = do
  TypedExpr t1 e1' <- check e1
  TypedExpr t2 e2' <- check e2
  case testEquality t1 t2 of
    Just Refl -> ... use e1' and e2' here, you know they have the same Type
    Nothing -> Left ...

Find a fully fleshed-out example here.

0
votes

Your exact question is easily answerable with the following solution:

check (UAdd (UNum a) (UNum b)) = Right $ TypedExpr $ TAdd (TNum a) (TNum b)

However there's quite a few indications of a design maze there.

Do you realize that once you put something into TypedExpr you lose all the information about the a type? This renders your check function quite pointless.

I understand that you do so because it's the only way you can unify the types of your GADT, and otherwise you can't implement the check function. But actually it just proves that you're modeling things wrong and that GADT might be inappropriate for your use case.

I don't understand why the UAdd constructor is over Untyped values, instead of Int, and I don't understand what made you go with this multi-stage AST strategy.

I could go on, but I'll break here and just recommend you to redesign your model.

-1
votes

Technically it is doable, but it's quite inconvenient: you have to "dig" until the GADT constructor is found.

check :: Untyped -> Either String TypedExpr
check (UNum n) = return $ TypedExpr $ TNum n
check (UStr s) = return $ TypedExpr $ TStr s
check (UAdd t1 t2) = do
    t1 <- check t1
    t2 <- check t2
    case (t1, t2) of
      (TypedExpr (TNum x)     , TypedExpr (TNum y))
         -> return $ TypedExpr $ TAdd (TNum x    ) (TNum y)
      (TypedExpr (TAdd x1 x2) , TypedExpr (TNum y))
         -> return $ TypedExpr $ TAdd (TAdd x1 x2) (TNum y)
      (TypedExpr (TNum x)     , TypedExpr (TAdd y1 y2))
         -> return $ TypedExpr $ TAdd (TNum x    ) (TAdd y1 y2)
      (TypedExpr (TAdd x1 x2) , TypedExpr (TAdd y1 y2))
         -> return $ TypedExpr $ TAdd (TAdd x1 x2) (TAdd y1 y2)
      _ -> Left "type error"

I would look for alternatives. The above approach suffers from combinatorial explosion when the number of constructors is large.