2
votes

I want a grammar and evaluator (ANTLR parse tree walker) that contains only binary non-terminals without the need to switch on the operator when visiting an expression node to determine what operation to take (as is the case for a pre-left factored grammar, since the visitor would visit an "additionNode" the visitor can assume statically that it must do an addition).

Fairly direct question. ANTLR supports left recursion, so this is a valid grammar

expr :
    | expr ('+'|'-') expr
    | expr ('*'|'/') expr
    | '(' expr ')'
    | literal
    ;

nifty, but any walker/visitor/compiler-back-end for this now has to do its own dispatching on types, which stinks:

onVisitExit(ExprContext ctx){
    left = compiledMap.get(ctx.getChild(0));
    right = compiledMap.get(ctx.getChild(2));
    operator = ctx.getChild(1);
    switch(operator.getToken()){
        case "+": compiledMap.put(ctx, left + right);
        case "-": compiledMap.put(ctx, left - right);
        case "*": compiledMap.put(ctx, left * right);
        case "/": compiledMap.put(ctx, left / right);
    }
}

the pros and cons to this strategy:

  • antlr builds a binary tree for me, where (binary) each rule has a left and a right argument, meaning I dont have to worry about while loops for kleene closures. I really like this
  • I have to manually dispatch (switch) on the token, rather than on the type of the node.

Using the more traditional & already left-factored grammar

expr                : addOrSub ;
addOrSub            : multOrDiv (('+'/'-') multOrDiv)* ;
multOrDiv           : bracks (('*'/'/') backs)* ;
bracks              : '(' expr ')' | literal ;
literal             : TOKEN ;

The corresponding visitor for this has the opposite pros and cons to the one 2 grammars above: ANTLR will do this dispatching on types for me --well mostly, still have to differentiate between '+' and '-' -- but now I have to include while loops for those kleene closures because I dont have a strict binary tree anymore which is annoying.

I think my ideal grammar would be like this

expression : expr ;

fragment expr : 
    (addition | subtraction) 
    | (multiplication | division)
    | brackets
    | literal
    ;

addition        : expr '+' expr ;
subtraction     : expr '-' expr ;
multiplication  : expr '*' expr ;
division        : expr '/' expr ;
brackets        : '(' expr ')' ;
literal         : TOKEN ; 

And this would solve all my problems, except of course that is illegal in ANTLR

1

1 Answers

2
votes

After writing out this question it got me thinking a little bit more functionally

Long story short, I'm using the grammar

expr : 
    | expr (plus|minus) expr
    | expr (multi|div) expr
    | '(' expr ')'
    | literal
    ;

plus   : '+' ;
minus  : '-' ;
multi  : '*' ;
div    : '/' ;

which gives us the listening-visitor:

onVisitExit(ExprContext ctx){
    left = values.get(ctx.child(0));
    right = values.get(ctx.child(2));
    operator = binaryOperators.get(ctx.child(1));

    result = operator.doUsing(left, right);
    values.put(ctx, result);
}

onVisitExit(PlusContext ctx){
    binaryOperators.put(ctx, (left, right) -> left + right);
}

onVisitExit(MinusContext ctx){
    binaryOperators.put(ctx, (left, right) -> left - right);
}

//...

which solves all of my problems --infact the first visitor implementation might very well not have a single if or for statement in it, which would be really nifty.

But to keep the long story long:

Standard polymorphism techniques would have you try to push the functionality in the switch into the the variable your switching on. So the code

switch(operator.getToken()){
    case "+": do(left + right);
    //...

becomes

operator.doOperationUsing(left, right);

and then the various implementations of operator would do different things. The problem is generating the different implemnetations for operator. With typical polymorphism, if you just use an enum, an enum with custom implementations per enum-instance really isn't any better than just switching. But here, we can use a visited non-terminal rule to generate that implementation, with a lambda :)