8
votes

I seem to be struggling with the AST->StringTemplate side of things, probably cause I'm coming from writing parsers by hand -> LLVM.

What I'm looking for is a way to automatically match up a parsing rule to an AST class that can represent it and contains a method to generate the target language output. (probably using StringTemplate, in this case.)

In pseudo code, given this example grammar:

numberExpression
    : DIGIT+
    ;

I want to have it mapped to this AST class:

class NumberExpressionAST extends BaseAST {
    private double value;

    public NumberExpressionAST(node) {
        this.value = node.value;
    }

    public String generateCode() {
        // However we want to generate the output.
        // Maybe use a template, maybe string literals, maybe cure cancer...whatever.
    }
}

To mate them up, maybe there would be some glue like below: (or you could go crazy with Class.forName stuff)

switch (child.ruleName) {
    case 'numberExpression':
        return new NumberExpressionAST(child);
        break;
}

I've been scouring the web and I found parse rewrite rules in the grammar with -> but I can't seem to figure out how to keep all this logic out of the grammar. Especially the code to setup and generate the target output from the template. I'm OK with walking the tree multiple times.

I thought that maybe I could use the option output=AST and then maybe provide my own AST classes extending from the CommonTree? I'll admit, my grasp on ANTLR is very primitive, so forgive my ignorance. Every tutorial I follow shows doing all this stuff inline with the grammar which to me is totally insane and hard to maintain.

Can someone point me to a way of accomplishing something similar?

Goal: keep AST/codegen/template logic out of the grammar.

EDIT ---------------------------------------------

I've resorted to tracing through ANTLR's actual source code (since they use themselves) and I'm seeing similar things like BlockAST, RuleAST, etc all inheriting from CommonTree. I haven't quite figured out the important part...how they're using them..

From looking around, I noticed you can basically type hint tokens:

identifier
    : IDENTIFIER<AnyJavaClassIWantAST>
    ;

You can't do exactly the same for parse rules...but if you create some token to represent the parse rule as a whole, you can use rewrite rules like so:

declaration
    : type identifier -> SOME_PARSE_RULE<AnyJavaClassIWantAST>
    ;

All this is closer to what I want, but ideally I shouldn't have to litter the grammar...is there any way to put these somewhere else?

1
"Goal: keep AST/codegen/template logic out of the grammar... [I]deally I shouldn't have to litter the grammar..." It sounds like you want all the benefits of ANTLR with none of the benefits of ANTLR. ;) I think your only real options are to write your own grammar parser that does it your way or to bite the bullet and use ANTLR as it was designed: using generated code, specifying AST types in the grammar, and all that.user1201210
I see your point, though maybe I was a bit too literal in my "goal". ANTLR is certainly more than just the grammar syntax being parsed, so I definitely want to harness its other features, but some level of abstraction from the actual grammar rules themselves would be nice. I think my identifier : IDENTIFIER<AnyJavaClassIWantAST> ; feature I found will suit me well enough.jayphelps
If you're willing to switch to ANTLR 4, you may get closer to your goal using its alternative labels, which turn into listener events that are fired by the generated code. I don't know enough about them at this point to give a full-fledged answer, but it does look like a nice layer of language-neutral abstraction.user1201210
@tenterhook Very cool. Could you add this as an answer and I'll gladly accept it? Thank you!jayphelps

1 Answers

7
votes

Could you add this as an answer...

Here is a contrived example that uses a handful of ANTLR4's features that go a long way towards separating the grammar from the output language, mainly the alternative labels and the generated listener. This example grammar can represent a few trivial bits of code, but it does so with no language references -- not even a call to skip() for whitespace in the lexer. The test class converts the input to some Java output using the generated listener.

I avoided using anything that I couldn't get to work on the first couple of tries, so don't consider this an exhaustive example by any means.

Simplang.g

grammar Simplang;


compilationUnit : statements EOF;
statements      : statement+;
statement       : block #BlockStatement 
                | call  #CallStatement
                | decl  #DeclStatement
                ;
block           : LCUR statements RCUR;    
call            : methodName LPAR args=arglist? RPAR SEMI;
methodName      : ID;
arglist         : arg (COMMA arg)*;
arg             : expr;    
decl            : VAR variableName EQ expr SEMI;
variableName    : ID;
expr            : add_expr;     
    
add_expr        : lhs=primary_expr (add_op rhs=primary_expr)*;
add_op          : PLUS | MINUS;    
primary_expr    : string=STRING
                | id=ID
                | integer=INT
                ;    
    
VAR: 'var';   
ID: ('a'..'z'|'A'..'Z')+;
INT: ('0'..'9')+;
STRING: '\'' ~('\r'|'\n'|'\'')* '\'';
SEMI: ';';
LPAR: '(';
RPAR: ')';
LCUR: '{';
RCUR: '}';
PLUS: '+';
MINUS: '-';    
COMMA: ',';
EQ: '=';
WS: (' '|'\t'|'\f'|'\r'|'\n') -> skip;

Along with the lexer and parser, ANTLR4 generates a listener interface and default empty implementing class. Here's the interface generated for the grammar above.

SimplangListener.java

public interface SimplangListener extends ParseTreeListener {
    void enterArglist(SimplangParser.ArglistContext ctx);
    void exitArglist(SimplangParser.ArglistContext ctx);
    void enterCall(SimplangParser.CallContext ctx);
    void exitCall(SimplangParser.CallContext ctx);
    void enterCompilationUnit(SimplangParser.CompilationUnitContext ctx);
    void exitCompilationUnit(SimplangParser.CompilationUnitContext ctx);
    void enterVariableName(SimplangParser.VariableNameContext ctx);
    void exitVariableName(SimplangParser.VariableNameContext ctx);
    void enterBlock(SimplangParser.BlockContext ctx);
    void exitBlock(SimplangParser.BlockContext ctx);
    void enterExpr(SimplangParser.ExprContext ctx);
    void exitExpr(SimplangParser.ExprContext ctx);
    void enterPrimary_expr(SimplangParser.Primary_exprContext ctx);
    void exitPrimary_expr(SimplangParser.Primary_exprContext ctx);
    void enterAdd_expr(SimplangParser.Add_exprContext ctx);
    void exitAdd_expr(SimplangParser.Add_exprContext ctx);
    void enterArg(SimplangParser.ArgContext ctx);
    void exitArg(SimplangParser.ArgContext ctx);
    void enterAdd_op(SimplangParser.Add_opContext ctx);
    void exitAdd_op(SimplangParser.Add_opContext ctx);
    void enterStatements(SimplangParser.StatementsContext ctx);
    void exitStatements(SimplangParser.StatementsContext ctx);
    void enterBlockStatement(SimplangParser.BlockStatementContext ctx);
    void exitBlockStatement(SimplangParser.BlockStatementContext ctx);
    void enterCallStatement(SimplangParser.CallStatementContext ctx);
    void exitCallStatement(SimplangParser.CallStatementContext ctx);
    void enterMethodName(SimplangParser.MethodNameContext ctx);
    void exitMethodName(SimplangParser.MethodNameContext ctx);
    void enterDeclStatement(SimplangParser.DeclStatementContext ctx);
    void exitDeclStatement(SimplangParser.DeclStatementContext ctx);
    void enterDecl(SimplangParser.DeclContext ctx);
    void exitDecl(SimplangParser.DeclContext ctx);
}

Here's a test class that overrides a few methods in the empty listener and calls the parser.

SimplangTest.java

public class SimplangTest {

    public static void main(String[] args) {

        ANTLRInputStream input = new ANTLRInputStream(
                "var x = 4;\nfoo(x, 10);\nbar(y + 10 - 1, 'x' + 'y' + 'z');");

        SimplangLexer lexer = new SimplangLexer(input);

        SimplangParser parser = new SimplangParser(new CommonTokenStream(lexer));

        parser.addParseListener(new SimplangBaseListener() {
            public void exitArg(SimplangParser.ArgContext ctx) {
                System.out.print(", ");
            }

            public void exitCall(SimplangParser.CallContext call) {
                System.out.print("})");
            }

            public void exitMethodName(SimplangParser.MethodNameContext ctx) {
                System.out.printf("call(\"%s\", new Object[]{", ctx.ID()
                        .getText());
            }

            public void exitCallStatement(SimplangParser.CallStatementContext ctx) {
                System.out.println(";");
            }

            public void enterDecl(SimplangParser.DeclContext ctx) {
                System.out.print("define(");
            }

            public void exitVariableName(SimplangParser.VariableNameContext ctx) {
                System.out.printf("\"%s\", ", ctx.ID().getText());
            }

            public void exitDeclStatement(SimplangParser.DeclStatementContext ctx) {
                System.out.println(");");
            }

            public void exitAdd_op(SimplangParser.Add_opContext ctx) {
                if (ctx.MINUS() != null) {
                    System.out.print(" - ");
                } else {
                    System.out.print(" + ");
                }
            }

            public void exitPrimary_expr(SimplangParser.Primary_exprContext ctx) {
                if (ctx.string != null) {
                    String value = ctx.string.getText();
                    System.out.printf("\"%s\"",
                            value.subSequence(1, value.length() - 1));
                } else if (ctx.altNum == 2){    //cheating and using the alt# for "INT"
                    System.out.printf("read(\"%s\")", ctx.id.getText());
                } else {
                    System.out.print(ctx.INT().getText());
                }
            }
        });

        parser.compilationUnit();
    }
}

Here's the test input hard-coded in the test class:

var x = 4;
foo(x, 10);
bar(y + 10 - 1, 'x' + 'y' + 'z');

Here's the output produced:

define("x", 4);
call("foo", new Object[]{read("x"), 10, });
call("bar", new Object[]{read("y") + 10 - 1, "x" + "y" + "z", });

It's a silly example, but it shows a few of the features that might be useful to you when building a custom AST.