6
votes

, where I define IR as a 3-address code type representation (I realize that one can mean by it an AST representation as well).

It is my understanding that, when writing a best-practice compiler for an imperative language, code optimization happens both on the AST (probably best using a Visitor Pattern), and on the IR produced from the AST. 

(a) Is that correct?

(b) Which type of optimization steps are best handled on the AST before even producing an IR? (reference to an article/a list online welcome too as long as it deals with an imperative language) 

The compiler I'm working on is for Decaf (which some might know) which has a fairly deep CFG up to (single) class inheritance; I'll add features not part of it such as type coercion. It will be completely hand-coded (using no tools whatsoever). This is not homework; writing it for fun. 

3

3 Answers

3
votes

(a) Yes.

(b) Constant folding is one example; CSE is another; in fact almost anything to do with expression evaluation. IR-phase optimizations are more about what results from flow analysis.

3
votes

IR is a form of an AST (often it is "flattened", but there are deep tree IRs as well), it may not be easy to distinguish one from another, especially if compiler is implemented as a sequence of very small rewrites from an original AST all the way down to a final IR suitable for instruction selection.

Optimisations may happen anywhere on this chain, but some representations are more suitable for a wide range of optimisations, most notably, an SSA form, used by most of the modern compilers to do nearly all the optimisations.

1
votes

It's never too early to optimise (to coin a phrase). So there are optimisations performed before and during AST creation, on the AST itself, on the IR (if you have one) and on the code as it is generated. In C-like languages and those that compile to machine code, the effort goes into the later stages. In compilers targeting a VM I think there is less room for improvements at that stage.

Some early optimisations obviously work better than others. I don't know much about Decaf, but there are the obvious things like constant folding and constant expression evaluation. If you get the whole program in tree form before you have to generate any code you can find common subexpressions, do code migration, eliminate dead code/dead stores, hoist invariants, eliminate tail recursion and some kinds of strength reduction.

A lot of it depends on how hard you want to work and what your target is. You didn't say much about that.