2
votes

I have a set of Ocaml types to represent a syntax tree. There are types for the program, classes, methods, expressions etc. For example, a method is represented by a record type like this:

type method = { return:typeid; args:typeid list; body:expr } 

It includes a return type, a type for each argument, and the body definition. I want to type check the syntax tree, and produce a new kind of tree that looks very much like the old one, except every expression has an explicit typeid (known only after type-checking) associated with it.

One option is to declare a parallel set of types:

type typed_expr = expr * typeid
type typed_method = { return:typeid; args:typeid list; body:typed_expr }
(* ... there are more types *)

The typed_method is necessary because the typed_expr is a different type. But I don't want to maintain two almost identical sets of types for an unchecked AST and a checked AST.

An alternative approach is to define expression as follows:

type expr = {...; typ:typeid option}

This enables me to use the same type definitions for both the input to the checker and the output. The difference is that I move lots of checks to the consumer code of the checked syntax tree. There is a contract here that the typ field will never be None in the type-checker output, and will always be None in the type-checker input.

Now, every time I used the typed tree, the only way to access the typ field's inner value is to first check if it is None (which it shouldn't be). This makes all later consumer code ugly because of the extra checks.

Neither of these approaches seems satisfying to me. How would you model this?

1

1 Answers

4
votes

The first is better then the second: it may be lousy to have 2 sets of data types which look similar, but it is safe: the invariant you have to take care of in the second approach is solved by the types. Actually OCaml compiler implementation takes this approach: see parsetree.mli and typedtree.mli.

Between the first and the second, you may want to define data types whose typ fields are parameterized:

type 'typ expr = { ...; typ : 'typ }

Then you can use unit expr for untyped AST and typeid expr for typed AST.

I still prefer the first approach of having diffrent sets of data types for untyped and typed, since it is often the case that ASTs of the two worlds can have some other differences than types.