0
votes

so i recently started learning Assembly language and im having trouble creating Abstract Syntax Trees (AST) and then implementing them in Assembly. So lets say i have this equation: z = (3 - 2*x)*x - 2*y + 1. So would the following AST be correct, as i know that there are multiple answers, each one differing in implementation?

                             =
                            / \
                           -   *
                          / \   \
                         *   3   *
                        / \     / \
                       2   x   -   +
                              / \  / \
                             x   2 y  1

From there, how would i implement this into my code(if the tree is correct)? I dont know where to start unfortunately. Thanks in advance.

2
The AST is incorrect as it should reflect the path to go to obtain the result.ivan_pozdeev
As for assembly implementation, that's up to you so as of now, this should be closed as "too broad".ivan_pozdeev
how would i fix the tree ?ShadowViper
Well, remember the operator precedence in arithmetics and make the tree conform to the operations that would be performed.ivan_pozdeev
i need a visual to understand, sorry.ShadowViper

2 Answers

0
votes

Applications that manipulate ASTs are not generally small, and it does very little good to code them in assembler. You are better off writing AST manipulation a higher level language, where you can write the code that processes the tree more easily. (See my bio for tools that push the envelope on AST manipulation).

If you insist, then the key problem is to define an AST node structure. As a practical matter, it should:

  • Hold parent and children links
  • Hold a node type
  • Hold a literal value value
  • Fit inside a cache line

(These constraints come from very big ASTs that our tools manipulate).

If you stick with MASM-x86, the following struct definition will probably be appropriate:

  ASTNODE STRUCT
  NodeType dword ?     ; holds type of AST node
  LiteralValue dword ? ; holds child count, literal value or pointer to big literal value, as indicated by the type
  Parent  dword ?
  Children dword ?
  ASTNODE ENDS

[You can easily define the equivalent for MASM-x64. If you don't know how, you shouldnt be doing this.]

We assume that there are many AST node types, to distinguish statements, operators, operands, identifiers, ... so we need to distinguish between them, thus NodeType.

Base on Node, the literal value contains (assumed mutually exclusive cases): * Nothing (not necessary) * Number of children, if the node type is a list node * A literal constant, if the node type is for a leaf holding a small value * A pointer to a literal constant, if the node type is for a leaf holding a literal value larger than 32 bits * A pointer to an identifier string or symbol table entry, if the node type is "identifier"

The "Children" slot is special: it is essentially a dynamic array of pointers to other AST nodes. For many AST node types, the number of children is implicitly known; you could use table lookup on the node type or the code can just "know". For list nodes, the number of children need to match the length of the list as specified by the literal field.

Any node with less than 4 children fits into 32 bytes, and should be correspondingly aligned. Nodes with more than 4 children should be cache-line aligned.

You still need to build a parser, and it must create nodes an link them together by filling in the pointer fields.

I think you will find that building a parser with AST construction is a lot of work (esp. in assembler), and then you need to build something that does something with the tree.

0
votes

You need to set up a stack and a queue (last in, first out and first in, first out blocks of memory). You will also need the priorities of the operations (BEDMAS - (^/*+-= is a good start but a quick search of the Web will give you up to 16 different priorities for different operators). Now loop through your expressions:

  1. If it's a value, add it to the queue
  2. While there's an operation on the stack with a higher priority, move that to the end of the queue.
  3. Add the operation to the stack.

When all arguments are done, take the remainder of the stack and add it to the end of the queue.

The queue is now in Reverse Polish orientation - value1, value2, op so your example would be

2,x,*,3,-,{right hand branch},=

Since this is an invalid left hand operator, it would fail but ignoring that for now and carrying on. For every item in the queue:

  1. If it's a value, push it onto the stack.
  2. If it's an operator, do the operation on the top items on the stack and return the value back onto the stack.
  3. The final result will be left on the stack.

So in this case the stack would go:

 - {empty}    ; 2
 - 2          ; x
 - 2,x        ; *(multiply top 2 items on stack and push result)
 - {2x}       ; 3
 - {2x},3     ; -(subtract top 2 items on stack and push result)
 - {2x-3}     ; (final result)