I've recently started using ANTLR. I'm currently trying to encode an expression grammar with +
, -
, *
and array[index]
and a few more constructs.
This is the desired grammar:
Exp -> Exp (+ | - | * | < | &&) Exp
| Exp [ Exp ]
| -Exp
| ( Exp )
| Exp.length
| true
| false
| Id
| this
| ! Exp
I first refactored this into AndExp
, SumExp
, ProdExp
and so on to resolve precedence. Roughly like this:
Exp -> AndExp
AndExp -> CmpExp (&& CmpExp)*
CmpExp -> SumExp (< SumExp)*
SumExp -> ProdExp ((+|-) ProdExp)*
ProdExp -> UnaryExp (Times UnaryExp)*
UnaryExp -> Minus* PrimaryExp
PrimaryExp -> Exp.length
| Exp [ Exp ]
| ! Exp
| true
| false
| Id
| this
I then realized that this uses left-recursion, and that ANTLR doesn't like that. I went on to eliminate the left-recursion and I ended up with this grammar:
grammar test;
options {
language=Java;
output=AST;
backtrack=true;
}
start : expression;
expression : andExp;
andExp : cmpExp (And^ cmpExp)*;
cmpExp : sumExp (LessThan^ sumExp)*;
sumExp : prodExp ((Plus | Minus)^ prodExp)*;
prodExp : unaryExp (Times^ unaryExp)*;
unaryExp : Minus* primaryExp;
primaryExp : INT primaryPrime
| 'true' primaryPrime
| 'false' primaryPrime
| 'this' primaryPrime
| ID primaryPrime
| '!' expression primaryPrime
| '('! expression ')'! primaryPrime
;
primaryPrime
: '[' expression ']' primaryPrime
| '.' ID '(' exprList ')' primaryPrime
| '.length' primaryPrime
| 'new' 'int' '[' expression ']' primaryPrime
| 'new' ID '(' ')' primaryPrime
|
;
exprList :
| expression (',' expression)*;
LessThan : '<';
Plus : '+';
Minus : '-';
Times : '*';
And : '&&';
Not : '!';
INT : '0' | ('1'..'9')('0'..'9')*;
ID : ('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'0'..'9'|'_')*;
WS : ('\t' | ' ' | '\r' | '\n'| '\u000C') { $channel=HIDDEN; } ;
Q1: Is backtracking "required" for this type of grammar (I can't get it through ANTLR unless I activate it) or am I missing something simple?
Q2: When adding a few
^
and-> ^(TOKEN ...)
constructs to brush up the tree, I ran into the following annoying situation (because of theprimaryPrime
which is due to the left factoring):primaryPrime : '[' expression ']' primaryPrime -> ^(ARRAY_EXPR expression) //...
This turns an
array[index]
intoarray ARRAY_EXPR index
while I really want
ARRAY_EXPR array index
What is the best way to solve this? Am I on the right track here, or should I go with some other approach all together?