9
votes

I'm writing a little Haskell compiler, and I want to implement as much Haskell 2010 as possible. My compiler can parse a module, but completing modules to a program seems to be a non-trivial task. I made up some examples of tricky, but maybe valid, Haskell modules:

module F(G.x) where
  import F as G
  x = 2

Here the module F exports G.x, but G.x is the same as F.x, so module F exports x if, and only if, it exports x.

module A(a) where
  import B(a)
  a = 2

module B(a) where
  import A(a)

In this example, to resolve the exports of module A the compiler has to check if a imported from B is the same as the declared a = 2, but B exports a if, and only if, A exports a.

module A(f) where
  import B(f)

module B(f) where
  import A(f)

During resolving module A, the compiler may've assumed that f imported from B exists, implying that A exports f, thus B can import A(f) and export f. The only problem is that there's no f defined anywhere :).

module A(module X) where
  import A as X
  import B as X
  import C as X
  a = 2

module B(module C, C.b) where
  import C
  b = 3

module C(module C)
  import B as C
  c = 4

Here, the module exports cause that export lists are dependent on each other and on themselves.

All these examples should be valid Haskell, as defined by the Haskell 2010 spec.

I want to ask if there is any idea how to correctly and completely implement Haskell modules?

Assume that a module contains just (simple) variable bindings, imports (possibly with as or qualified), and exports list of possibly qualified variables and module ... abbreviations. The algorithm has to be able to:

  • compute finite list of exported variables of each module
  • link every exported variable to its binding
  • link every (maybe qualified) variable used in every module to its binding
1

1 Answers

9
votes

You may be interested in A Formal Specification for the Haskell 98 Module System.

I'm also covering some interesting edge cases in a series of blog posts, of which only the first one is published as of now.

Finally, I'm working on exactly that — a library that handles Haskell modules. It's called haskell-names.

Depending on your goals, you can simply use it in your compiler, study the source code or contribute. (Your examples will make excellent test cases.)


To answer your question: recursive modules are handled by computing a fixed point.

You start with a strongly connected component in the module graph. For each module in this component, you start with an assumption that it exports nothing. Then you revisit these modules, and compute new export lists based on the fresh information. You can prove that this process is monotonic — every time the export list grows (or, at least, does not shrink). Sooner or later it stops growing — then you've reached the fixed point.

You can optimize this algorithm by borrowing some ideas from static analysis (that community is very good at computing fixed points), but my package currently implement the naive algorithm (code).