6
votes

I'm writing a parser and a type checker in Haskell starting from BNFC. The main function of the type checker is implemented as follows:

typecheck :: Program -> Err ()
typecheck (PDefs ds) = do
    env  <- foldM (\env  (DFun typ id args _ _) -> 
               updateFun env id (argTypes args,typ) ) (emptyEnv) (ds)
    mapM_ (checkDef env) ds 
    where argTypes = map (\(ADecl _ typ _) -> typ)

where PDefs ,DFun, and ADecl are constructrors of algebraic data types defined in the abstract syntax of the language and checkDef and updateFun are functions. Program is the "starting point" of the grammar. The monad used is the monad Err:

    data Err a = Ok a | Bad String
       deriving (Read, Show, Eq, Ord)

    instance Monad Err where
       return      = Ok
       fail        = Bad
       Ok a  >>= f = f a
       Bad s >>= f = Bad s 

The typechecker function is called in the "main" module (where before the type check there are the lexical and the sintax analysis):

    check :: String -> IO ()
    check s = do
               case pProgram (myLexer s) of
                  Bad err -> do
                          putStrLn "SYNTAX ERROR"
                          putStrLn err
                          exitFailure
                  Ok  tree -> do 
                          case typecheck tree of
                             Bad err -> do
                                 putStrLn "TYPE ERROR"
                                 putStrLn err
                                 exitFailure
                             Ok _ -> do
                                 putStrLn "\nParsing e checking ok!"
                                 showTree tree

(tree is the abstract syntax tree build by the parser)

If there's a type error in the program passed as input the type checker returns an error saying what is wrong and it doesn't continue. Is there a way to allow the type checker to list all the errors in the input in a single execution?

1
You need to collect all the error in a list, either by changing your Err type to store a list of strings, or by "throwing" a list or Err instead of just one.mb14
I understand that but is not so simple. Using foldM the computation ends when it finds a Bad s and this doesn't change even if I use s as a list of strings. Maybe there's a way to store a error and then apply the function f defined in Bad s >>= f like in Ok a >>= f = f a?Goat16
your parser needs to not stop the computation but carry on an accumulate error. That's probably a bit tricky but doable.mb14
You probably need something similar to a Writer Monad.mb14

1 Answers

3
votes

As mostly covered in @mb14's comments, the usual method involves doing two things:

  • First, instead of returning either a type-checked tree or an error, be prepared to always return a type-checked tree together with a log of zero or more errors. This is easily accomplished with a Writer monad.
  • Second, whenever an error is detected, log the error, try to recover by assuming some valid type for the node being type-checked, and continue the type check.

In this simple scheme, the type checking always returns a typed tree. If the log of error messages is empty, the type check has succeeded, and the typed tree is valid. Otherwise, the type check has failed with the given set of errors, and the typed tree can be discarded. In a more complex scheme, you could differentiate between warnings and errors in your log, and consider the type checking to have succeeded if it contains zero or more warnings, but no errors.

I've included a complete example of the technique below for a very simplified grammar. It only returns the top-level type instead of the typed tree, but this is just to keep the code simple -- returning a type-checked tree is not difficult. The hard part in adapting it to your grammar will be determining how to forge ahead (i.e., what valid type to supply) when an error occurs, and it will be highly dependent on the details of your program. Some general techniques are illustrated below:

  • If a node always returns a particular type (e.g., Len below), always assume that type for the node, even if the node doesn't type-check.
  • If a node combines compatible types to determine its result type (e.g, Plus below, or a BNF alternation), then when a type incompatibility is detected, take the type of the node to be determined by the type of its first argument.

Anyway, here is the complete example:

import Control.Monad.Writer

-- grammar annotated with node ids ("line numbers")
type ID = String
data Exp = Num  ID Double         -- numeric literal
         | Str  ID String        -- string literal
         | Len  ID Exp           -- length of a string expression
         | Plus ID Exp Exp       -- sum of two numeric expressions
                                 -- or concat of two strings
-- expression types
data Type = NumT | StrT deriving (Show, Eq)

-- Expressions:
--    exp1 =    1 + len ("abc" + "def")
--    exp2 =    "abc" + len (3 + "def")
-- annotated with node ids
exp1, exp2 :: Exp
exp1 = Plus "T1" (Num "T2" 1)
                 (Len "T3" (Plus "T4" (Str "T5" "abc")
                                      (Str "T6" "def")))
exp2 = Plus "T1" (Str "T2" "abc")
                 (Len "T3" (Plus "T4" (Num "T5" 3)
                                      (Str "T6" "def")))
-- type check an expression
data Error = Error ID String deriving (Show)
type TC = Writer [Error]

typeCheck :: Exp -> TC Type
typeCheck (Num _ _) = return NumT
typeCheck (Str _ _) = return StrT
typeCheck (Len i e) = do
  t <- typeCheck e
  when (t /= StrT) $
    tell [Error i ("Len: applied to bad type " ++ show t)]
  return NumT  -- whether error or not, assume NumT
typeCheck (Plus i d e) = do
  s <- typeCheck d
  t <- typeCheck e
  when (s /= t) $
    tell [Error i ("Plus: incompatible types "
                   ++ show s ++ " and " ++ show t
                   ++ ", assuming " ++ show s ++ " result")]
  return s -- in case of error assume type of first arg

compile :: String -> Exp -> IO ()
compile progname e = do
  putStrLn $ "Running type check on " ++ progname ++ "..."
  let (t, errs) = runWriter (typeCheck e)
  case errs of
    [] -> putStrLn ("Success!  Program has type " ++ show t)
    _  -> putStr ("Errors:\n" ++ 
            unlines (map fmt errs) ++ "Type check failed.\n")
      where fmt (Error i s) = "   in term " ++ i ++ ", " ++ s

main :: IO ()
main = do compile "exp1" exp1
          compile "exp2" exp2

It generates the output:

Running type check on exp1...
Success!  Program has type NumT
Running type check on exp2...
Errors:
   in term T4, Plus: incompatible types NumT and StrT, assuming NumT result
   in term T3, Len: applied to bad type NumT
   in term T1, Plus: incompatible types StrT and NumT, assuming StrT result
Type check failed.