Applications that manipulate ASTs are not generally small, and it does very little good to code them in assembler. You are better off writing AST manipulation a higher level language, where you can write the code that processes the tree more easily. (See my bio for tools that push the envelope on AST manipulation).
If you insist, then the key problem is to define an AST node structure. As a practical matter, it should:
- Hold parent and children links
- Hold a node type
- Hold a literal value value
- Fit inside a cache line
(These constraints come from very big ASTs that our tools manipulate).
If you stick with MASM-x86, the following struct definition will probably be appropriate:
ASTNODE STRUCT
NodeType dword ? ; holds type of AST node
LiteralValue dword ? ; holds child count, literal value or pointer to big literal value, as indicated by the type
Parent dword ?
Children dword ?
ASTNODE ENDS
[You can easily define the equivalent for MASM-x64. If you don't know how, you shouldnt be doing this.]
We assume that there are many AST node types, to distinguish statements, operators, operands, identifiers, ... so we need to distinguish between them, thus NodeType.
Base on Node, the literal value contains (assumed mutually exclusive cases):
* Nothing (not necessary)
* Number of children, if the node type is a list node
* A literal constant, if the node type is for a leaf holding a small value
* A pointer to a literal constant, if the node type is for a leaf holding a literal value larger than 32 bits
* A pointer to an identifier string or symbol table entry, if the node type is "identifier"
The "Children" slot is special: it is essentially a dynamic array of pointers to other AST nodes. For many AST node types, the number of children is implicitly known; you could use table lookup on the node type or the code can just "know". For list nodes, the number of children need to match the length of the list as specified by the literal field.
Any node with less than 4 children fits into 32 bytes, and should be correspondingly aligned. Nodes with more than 4 children should be cache-line aligned.
You still need to build a parser, and it must create nodes an link them together by filling in the pointer fields.
I think you will find that building a parser with AST construction is a lot of work (esp. in assembler), and then you need to build something that does something with the tree.