Building an interpreter: designing an AST

Question

So I am making an interpreter for a language which I am making which is similar to Python. Now I understand that this is no small task and I don't expect it to work very well or do much but I would like it to have some basic functionality (variables, functions, loops, if statements, etc...).

So currently I am at the stage where the interpreter takes a file, and splits it up into a list of tokens, and now I am ready to turn these tokens into an AST. I intend to do this with a recursive descent parser, which I believe I understand, but here is the problem. Let's say I have the following input

1 + 2 * 3

this would output 7, because using BIDMAS the multiplication is done first so

2 * 3 = 6

then the addition is done after

1 + 6 = 7

I know how to get this order as I have a simple grammar, but I do not know how to store this as an AST. To simplify things for the answers, lets assume this is the only input you will recieve and the grammar can be

program = add
add = mul {"+" mul}
mul = NUM {"*" NUM}

So basically, how do you make a data structure(s) to store an AST?

P.S. I am doing this in C.

You should buy the Compilers "dragon" book from Aho Seti and Ullman...and a tool called bison will help you to achieve what you intend in a fraction of the time you spend to do it "by hand" — Leo
unfortunately I cannot use bison as the aim of this project is to do all of it "by hand". thanks anyway — dangee1705

Frank C. Frank C. · Accepted Answer · 2017-04-12T17:51:11

Disclaimer: This representation is subjective and just meant to illuminate.

Fundamentally, your AST will be constructed like a binary tree where each AST node is a "C" structure that holds both a "left" and "right" pointer. The other elements of the AST are typically context sensitive. For example, a variable declaration versus a function definition or an expression in a function. For the example you cited, the rough tree would mirror this:

So by substituting the above nodes 1 + (2 * 3) with your AST construct would be similar to:

                 -----------------
                | type: ADDOP   |
                | left: struct* |
                | right: struct*|
                -----------------
              /                   \
             /                     \
 (ADDOP left points to)         (ADDOP right points to)
------------------------       --------------------------  
| type: NUMLITERAL     |       | type: MULTOP           |
| value: 1             |       | left: struct*          |
| left: struct* (null) |       | right: struct*         |
| right: struct*(null) |       --------------------------
------------------------              /               \
                                     /                 \

                    (MULTOP left points to)         (MULTOP right points to)
                    ------------------------       --------------------------  
                    | type: NUMLITERAL     |       | type: NUMLITERAL       |
                    | value: 2             |       | value: 3               |
                    | left: struct* (null) |       | left: struct* (null)   |
                    | right: struct*(null) |       | right: struct* (null)  |
                    ------------------------       --------------------------

I assume that you know enough about "C" and how to malloc nodes and assign the left/right pointers.

Now the remaining activity would be to do a post order traversal of the tree to either evaluate the expression and produce a result or to emit the appropriate intermediate code/machine code that aligns to a compiled result. Either choice bringing with it a massive amount of thinking and planning on your part.

BTW: As noted, the AST nodes are going to typically have attributes based on the level of abstraction you want to represent. Also note that a typical compiler may leverage multiple AST for different reasons. Yep, more reading/studying on your part.

Note: This illustrates the data structure for an AST but @mikeb answer is rock solid for how to get the string "1 + 2 * 3" into the nodes of such a structure.

Building an interpreter: designing an AST

2 Answers