19
votes

I am implementing Algorithm W (the Hindley-Milner type system) in JavaScript:

Algorithm W

The function which implements the above rules is typecheck and it has the following signature:

typecheck :: (Context, Expr) -> Monotype

It is defined as follows:

function typecheck(context, expression) {
    switch (expression.type) {
    case "Var":
        var name = expression.name;
        var type = context[name];
        return inst(type);
    case "App":
        var fun = typecheck(context, expression.fun);
        var dom = typecheck(context, expression.arg);
        var cod = new Variable;
        unify(fun, abs(dom, cod));
        return cod;
    case "Abs":
        var param = expression.param;
        var env   = Object.create(context);
        var dom   = env[param] = new Variable;
        var cod   = typecheck(env, expression.result);
        return abs(dom, cod);
    case "Let":
        var assignments = expression.assignments;
        var env = Object.create(context);

        for (var name in assignments) {
            var value = assignments[name];
            var type  = typecheck(context, value);
            env[name] = gen(context, type);
        }

        return typecheck(env, expression.result);
    }
}

A little bit about the data types:

  1. A context is an object which maps identifiers to polytypes.

    type Context = Map String Polytype
    
  2. An expression is defined by the following algebraic data type:

    data Expr = Var { name        :: String                          }
              | App { fun         :: Expr,            arg    :: Expr }
              | Abs { param       :: String,          result :: Expr }
              | Let { assignments :: Map String Expr, result :: Expr }
              | Rec { assignments :: Map String Expr, result :: Expr }
    

In addition, we have the following functions which are required by the algorithm but are not essential to the question:

inst ::  Polytype -> Monotype
abs  :: (Monotype,   Monotype) -> Monotype
gen  :: (Context,    Monotype) -> Polytype

The inst function specializes a polytype and the gen function generalizes a monotype.

Anyway, I want to extend my typecheck function to allow recursive definitions as well:

Recursive definitions

Where:

  1. Recursive definition context one
  2. Recursive definition context two

However, I am stuck with a chicken and egg problem. Context number one has the assumptions v_1 : τ_1, ..., v_n : τ_n. Furthermore, it implies e_1 : τ_1, ..., e_n : τ_n. Hence, you first need to create the context in order to find the types of e_1, ..., e_n but in order to create the context you need to find the types of e_1, ..., e_n.

How do you solve this problem? I was thinking of assigning new monotype variables to the identifiers v_1, ..., v_n and then unifying each monotype variable with its respective type. This is the method that OCaml uses for its let rec bindings. However, this method does not yield the most general type as is demonstrated by the following OCaml code:

$ ocaml
        OCaml version 4.02.1

# let rec foo x = foo (bar true)     
  and bar x = x;;
val foo : bool -> 'a = <fun>
val bar : bool -> bool = <fun>

However, GHC does compute the most general type:

$ ghci
GHCi, version 7.10.1: http://www.haskell.org/ghc/  :? for help
Prelude> let foo x = foo (bar True); bar x = x
Prelude> :t foo
foo :: Bool -> t
Prelude> :t bar
bar :: t -> t

As you can see, OCaml infers the type val bar : bool -> bool while GHC infers the type bar :: t -> t. How does Haskell infer the most general type of the function bar?

I understand from @augustss' answer that type inference for recursive polymorphic functions is undecidable. For example, Haskell cannot infer the type of the following size function without additional type annotations:

data Nested a = Epsilon | Cons a (Nested [a])

size Epsilon     = 0
size (Cons _ xs) = 1 + size xs

If we specify the type signature size :: Nested a -> Int then Haskell accepts the program.

However, if we only allow a subset of algebraic data types, inductive types, then the data definition Nested becomes invalid because it is not inductive; and if I am not mistaken then the type inference of inductive polymorphic functions is indeed decidable. If so, then what is the algorithm used to infer the type of polymorphic inductive functions?

1

1 Answers

16
votes

You could type check it using explicit recursion with a primitive fix having type (a -> a) -> a. You can insert the fix by hand or automatically.

If you want to extend the type inferences then that's quite easy too. When encountering a recursive function f, just generate a new unification variable and put f with this type in the environment. After type checking the body, unify the body type with this variable and then generalize as usual. I think this is what you suggest. It will not allow you to infer polymorphic recursion, but this in general undecidable.