16
votes

I'm working on my own toy programming language. For now I'm interpreting the source language from AST and I'm wondering what advantages compiling to a byte-code and then interpreting it could provide me.

For now I have three things in mind:

  • Traversing the syntax tree hundreds of time may be slower than running instructions in an array, especially if the array support O(1) random access(ie. jumping 10 instructions up and down).
  • In typed execution environment, I have some run-time costs because my AST is typed, and I'm constantly traversing it(ie. I have 10 types of nodes and I need to check what type I'm on now to execute). Maybe compiling to an untyped byte-code could help to improve this, since after type-checking and compiling, I would have an untyped values and code.
  • Compiling to byte-code may provide better portability.

Are my points correct? What are some other motivations behind compiling to bytecode?

2
By running on an interpretor it increases the portability of your codeLuis
@Luis, yeah this was already on my mind, I forgot to add ..sinan
@Luis: that's bogus. Serialized ASTs can be made equally portable. Bytecode may in fact not be portable; Python bytecode is specific to each version of the interpreter.Fred Foo
It is also easier than interpreting an AST - bytecode should be low level enough. Compiling a high level language to a bytecode is simpler than interpreting a high level language straight away. E.g., instead of maintaining an environment with complicated semantics, instead of looking up variables by name, etc., you'll just have a simple stack frame accessed via simple load and store instructions.SK-logic

2 Answers

10
votes

Speed is the main reason; interpreting ASTs is just too slow in practice.

Another reason to use bytecode is that it can be trivially serialized (stored on disk), so that you can distribute it. This is what Java does.

8
votes

The point of generating byte code (or any other "easily interpreted" form such as threaded code) is essentially performance.

For an AST intepreter to decide what to do next, it needs to traverse the tree, inspect nodes, determine the type of nodes, check the type of any operands, verify legality, and decide which special case of the AST-designated operator applies (it says "+", but it means 16 bit add or string concatenate?), before it finally performs some action.

If one takes the final action and generates some kind of easily interpreted structure, then at "execution" time the interpreter can focus simply on performing actions without all that checking/special-case determination.

Another recent excuse is that if you generate byte code for any of a number of well-known virtual machines (JVM, MSIL, Parrot, etc.) you don't even have to code the interpreter. For the JVM and MSIL, you also get the benefit of the JIT compilers associated with them, and with careful design of your language, compatibility with huge libraries, which are the real attraction of Java and C#.