0
votes

I'm writing an interpreter for a programming language in C with Flex and Bison. I have no idea how to generate an AST and I would like to know what kind of performance difference there would be if I just interpreted the code as the interpreter see's it?

I would also like to know whether Bison generates an AST behind the scenes or not be I heard on some forums that it does and on other websites I have seen people create their own?

Thanks in advance, Francis

2

2 Answers

3
votes

re: How to build an AST using Bison:

For simple examples of building an AST from Bison, see this site: EPaperPress.

Also see Mitsuhisa Sato's example (This is code for a simple interpreter using Bison).

re: Performance difference of interpreter using AST vs. straight interpreting.

There are many ways you could write an interpreter - the list below is from (in general) slowest to fastest performance (note for trivial programs, there may be little difference. But for programs that repeat the same code many times, there can be a great difference):

  1. straight interpretation
  2. do some pre-processing, e.g., tokenizing and stripping comments
  3. build an AST and interpret that
  4. convert to intermediate code - byte code - via an AST or not - and interpret that

For a speed comparison, you could do the following:

Go to: Tom Gibson's Tiny C site.

And download the appropriate interpreter - either Tom Gibson's Tiny C for DOS, maybe Windows or Tom Gibson's Tiny C for Linux.

This is a pure interpreter - #1 above. Write a little program that computes primes, or something similar, and then time it.

The 'Tiny C' from Sato's site includes an interpreter that interprets from an AST - #3 above. Write a similar program for it, and time it.

Finally, get another Tiny C from: Marc Feeley's Tiny C.

This one creates an AST, converts it to byte code, and then interprets that - #4 above. Rewrite your little program for this one, and time that.

I hope this helps!

2
votes

Bison doesn't generate an AST on its own, but it does have features intended to help you write the code to build the AST.

First of all, the %union statement lets you define a union that represents the types of node you're going to use in your AST, so you can define types for things like variable declarations, expressions, etc.

Then, it lets you designate the union member to associate with the code that gets executed when a particular pattern is recognized, so you get type-checking, even though the C compiler basically does no type checking on a union.

As far as performance difference goes: it's next to impossible to guess. Generally building the AST first is intended as an optimization, but exactly how effective an optimization it will be depends both on the language you're interpreting and (especially) on the code you write.