22
votes

Is anyone aware of tutorials for walking ANTLR-generated ASTs in C#? The closest I was able to find is this, but it's not terribly helpful.

My goal is to walk through trees that I'm generating based on a domain-specific language that I'm working on, and to use the trees to output generated C# code.

A Java-based tutorial would be helpful, too -- anything that provides clear examples of how to traverse ANTLR ASTs.

4

4 Answers

20
votes

I managed to figure this out by adapting the example at the end of Manuel Abadia's article.

Here's my version, which I happen to be using to convert parsed code to C#. These are the steps:

  1. Instantiate an ANTLRStringStream or subclass with your input (it can be a file or string).
  2. Instantiate your generated lexer, passing in that string stream.
  3. Instantiate a token stream with the lexer.
  4. Instantiate your parser with that token stream.
  5. Get the top-level value from your parser, and turn it into a CommonTree.
  6. Traverse the tree:

To get the literal text of a node, use node.Text. To get the token name of a node, use node.Token.Text.

Note that node.Token.Text will only give you the actual name of your token if it's an imaginary token with no corresponding string. If it's a real token, then node.Token.Text will return its string.

For example, if you had the following in your grammar:

tokens { PROGRAM, FUNCDEC }

EQUALS : '==';
ASSIGN : '=';

Then you'll get "PROGRAM", "FUNCDEC", "==", and "=" from the corresponding accesses of node.Token.Text.

You can see part of my example below, or you can browse the full version.


public static string Convert(string input)
{
    ANTLRStringStream sStream = new ANTLRStringStream(input);
    MyGrammarLexer lexer = new MyGrammarLexer(sStream);

    CommonTokenStream tStream = new CommonTokenStream(lexer);

    MyGrammarParser parser = new MyGrammarParser (tStream);
    MyGrammarParser.program_return parserResult = parser.program();

    CommonTree ast = (CommonTree)parserResult.Tree;

    Print(ast);
    string output = header + body + footer;

    return output;
}

public static void PrintChildren(CT ast)
{
    PrintChildren(ast, " ", true);
}

public static void PrintChildren(CT ast, string delim, bool final)
{
    if (ast.Children == null)
    {
        return;
    }

    int num = ast.Children.Count;

    for (int i = 0; i < num; ++i)
    {
        CT d = (CT)(ast.Children[i]);
        Print(d);
        if (final || i < num - 1)
        {
            body += delim;
        }
    }
}

public static void Print(CommonTree ast)
{
    switch (ast.Token.Text)
    {
        case "PROGRAM":
            //body += header;
            PrintChildren(ast);
            //body += footer;
            break;
        case "GLOBALS":
            body += "\r\n\r\n// GLOBALS\r\n";
            PrintChildren(ast);
            break;
        case "GLOBAL":
            body += "public static ";
            PrintChildren(ast);
            body += ";\r\n";
            break;

      ....
    }
}
8
votes

Normally you walk ASTs with recursion, and perform different actions based on the kind of the node. If you're using polymorphic tree nodes (i.e. different subclasses for different nodes in the tree), then double-dispatch in the Visitor pattern may be appropriate; however, that's usually not very convenient with Antlr.

In pseudocode, walking usually looks somewhat like this:

func processTree(t)
    case t.Type of
        FOO: processFoo t
        BAR: processBar t
    end

// a post-order process
func processFoo(foo)
    // visit children
    for (i = 0; i < foo.ChildCount; ++i)
        processTree(foo.GetChild(i))
    // visit node
    do_stuff(foo.getText())

// a pre-order process
func processBoo(bar)
    // visit node
    do_stuff(bar.getText())
    // visit children
    for (i = 0; i < foo.ChildCount; ++i)
        processTree(foo.GetChild(i))

The kinds of processing are highly dependent on the semantics of the language. For example, handling an IF statement, with structure (IF <predicate> <if-true> [<if-false>]), when generating code for a stack machine like the JVM or CLR, might look somewhat like this:

func processIf(n)
    predicate = n.GetChild(0)
    processExpr(predicate) // get predicate value on stack
    falseLabel = createLabel()
    genCode(JUMP_IF_FALSE, falseLabel) // JUMP_IF_FALSE is called brfalse in CLR,
                                       // ifeq in JVM
    if_true = n.GetChild(1)
    processStmt(if_true)
    if_false = n.ChildCount > 2 ? n.GetChild(2) : null
    if (if_false != null)
        doneLabel = createLabel()
        genCode(JUMP, doneLabel)
    markLabel(falseLabel)
    if (if_false != null)
        processStmt(if_false) // if-false branch
        markLabel(doneLabel)

Generally everything is done recursively depending on the type of the current node, etc.

1
votes

You should look into writing a TreeParser; it can make the job of interpreting the tree much simpler.

For ANTLR 2.x see http://www.antlr2.org/doc/sor.html For ANTLR 3.x see http://www.antlr.org/wiki/display/ANTLR3/Tree+construction (java-based parser and tree parser example)

0
votes

I did something similar (but not really) and I ended up with a TreeParser.

I also suggest buying the ANTLR book. I found it to be more valuable than any web resource. It may not have all the answers but it sure helps with the basics.