3
votes

I've a pretty basic math expression grammar for ANTLR here and what's of interest is handling the implied * operator between parentheses e.g. (2-3)(4+5)(6*7) should actually be (2-3)*(4+5)*(6*7).

Given the input (2-3)(4+5)(6*7) I'm trying to add the missing * operator to the AST tree while parsing, in the following grammar I think I've managed to achieve that but I'm wondering if this is the correct, most elegant way?

grammar G; 

options {
    language = Java;
    output=AST;
ASTLabelType=CommonTree;
}

tokens {
  ADD = '+' ;
  SUB = '-' ;
  MUL = '*' ;
  DIV = '/' ;
  OPARN = '(' ;
  CPARN = ')' ;
}

start
    : expression EOF!
    ;

expression
    : mult (( ADD^ | SUB^ ) mult)*
    ;

mult
   : atom (( MUL^ | DIV^) atom)*    
   ;

atom
   : INTEGER
   | (
       OPARN  expression CPARN -> expression
     )

     (
       OPARN  expression CPARN -> ^(MUL expression)+
     )*  
   ;


INTEGER : ('0'..'9')+ ;
WS  : (' ' | '\t' | '\n' | '\r' | '\f')+ {$channel = HIDDEN;};

This grammar appears to output the correct AST Tree in ANTLRworks:

AST Output

I'm only just starting to get to grips with parsing and ANTLR, don't have much experience so feedback with really appreciated!

Thanks in advance! Carl

1

1 Answers

3
votes

First of all, you did a great job given the fact that you've never used ANTLR before.

You can omit the language=Java and ASTLabelType=CommonTree, which are the default values. So you can just do:

options {
  output=AST;
}

Also, you don't have to specify the root node for each operator separately. So you don't have to do:

(ADD^ | SUB^)

but the following:

(ADD | SUB)^

will suffice. With only two operators, there's not much difference, but when implementing relational operators (>=, <=, > and <), the latter is a bit easier.

Now, for you AST: you'll probably want to create a binary tree: that way, all internal nodes are operators, and the leafs will be operands which makes the actual evaluating of your expressions much easier. To get a binary tree, you'll have to change your atom rule slightly:

atom
   : INTEGER
   | (
       OPARN  expression CPARN -> expression
     )
     (
       OPARN  e=expression CPARN -> ^(MUL $atom $e)
     )*  
   ;

which produces the following AST given the input "(2-3)(4+5)(6*7)":

enter image description here

(image produced by: graphviz-dev.appspot.com)

The DOT file was generated with the following test-class:

import org.antlr.runtime.*;
import org.antlr.runtime.tree.*;
import org.antlr.stringtemplate.*;

public class Main {
  public static void main(String[] args) throws Exception {
    GLexer lexer = new GLexer(new ANTLRStringStream("(2-3)(4+5)(6*7)"));
    GParser parser = new GParser(new CommonTokenStream(lexer));
    CommonTree tree = (CommonTree)parser.start().getTree();
    DOTTreeGenerator gen = new DOTTreeGenerator();
    StringTemplate st = gen.toDOT(tree);
    System.out.println(st);
  }
}