6
votes

In the pipeline of GHC there is a stage of translating Haskell source code to Core and then (not necessarily as an immediate next step) translating Core to STG. However, one issue escapes me from my understanding - when do we have a "normal" code (i.e. as plain text), and when something actually living in memory, like abstract syntax trees (ASTs)?

And to make my question a bit more precise, I'll divide it into parts:

1) in the parsing of Haskell source file phase, do we immediately construct ASTs of Core language? If not, then it seems to me that we have to construct ASTs of full Haskell (which seems strange) and then either transform them to ASTs of Core, or firstly to textual representation of them in Core and again invoking parsing to obtain Core's ASTs.

2) almost the same question applies to Core to STG transition (but in this case I think I can assume that what we have is Core's ASTs - correct?)

1

1 Answers

14
votes

The Haskell source is first parsed into an AST of full Haskell, which is then typechecked.

From then on, it gets desugared to Core, translated to STG, from there to Cmm to either assembly or LLVM code. All these phases are built on ASTs, there is no textual representation of any of these stages until assembly/llvm code, which is then written to a file and compiled using external tools.

It’s not strage to have an AST of full Haskell. In fact, it is a requirement to give type errors in terms of the code the user wrote, instead of detecting type errors only at the level of Core.

You can find the AST for Haskell in the modules from HsSym and the AST of Core in CoreSyn.