ANTLR4 Unexpected Parse Behavior

Question

I am trying to build a new language with ANTLR, and I have run into a problem. I am trying to support numerical expressions and mathematical operations on numbers(pretty important I reckon), but the parser doesn't seem to be acting how I expect. Here is my grammar:

grammar Lumos;

/*
 * Parser Rules
 */

 program        : 'start' stat+ 'stop';

 block          : stat*
                ;

 stat           : assign
                | numop
                | if_stat
                | while_stat
                | display
                ;



 assign         : LET ID BE expr ;

 display        : DISPLAY expr ;
 numop          : add | subtract | multiply | divide ; 


 add            : 'add' expr TO ID ;
 subtract       : 'subtract' expr 'from' ID ;
 divide         : 'divide' ID BY expr ; 
 multiply       : 'multiply' ID BY expr ;

 append         : 'append' expr TO ID ;

 if_stat
 : IF condition_block (ELSE IF condition_block)* (ELSE stat_block)?
 ;

condition_block
 : expr stat_block
 ;

stat_block
 : OBRACE block CBRACE
 | stat
 ;

while_stat
 : WHILE expr stat_block
 ;



 expr           : expr POW<assoc=right> expr        #powExpr
                | MINUS expr                        #unaryExpr
                | NOT expr                          #notExpr
                | expr op=(TIMES|DIV|MOD) expr      #multiplicativeExpr
                | expr op=(PLUS|MINUS) expr         #additiveExpr
                | expr op=RELATIONALOPERATOR expr   #relationalExpr
                | expr op=EQUALITYOPERATOR expr     #equalityExpr
                | expr AND expr                     #andExpr
                | expr OR expr                      #orExpr
                //| ARRAY                               #arrayExpr
                | atom                              #atomExpr
                ;                                   

 atom           : LPAREN expr RPAREN                #parExpr
                | (INT|FLOAT)                       #numberExpr

                | (TRUE|FALSE)                      #booleanAtom
                | ID                                #idAtom
                | STRING                            #stringAtom
                | NIX                               #nixAtom
                ;


compileUnit                     : EOF ;

/*
 * Lexer Rules
 */

 fragment LETTER    : [a-zA-Z] ;

 MATHOP             : PLUS
                    | MINUS
                    | TIMES
                    | DIV
                    | MOD
                    | POW
                    ;

    RELATIONALOPERATOR  : LTEQ
                        | GTEQ
                        | LT
                        | GT
                        ;  

    EQUALITYOPERATOR    : EQ
                        | NEQ
                        ;

 LPAREN             : '(' ;
 RPAREN             : ')' ;
 LBRACE             : '{' ;
 RBRACE             : '}' ;

 OR                 : 'or' ;
 AND                : 'and' ;

 BY                 : 'by' ;
 TO                 : 'to' ;
 FROM               : 'from' ;
 LET                : 'let' ;
 BE                 : 'be' ;


 EQ                 :'==' ;
 NEQ                :'!=' ;
 LTEQ               :'<=' ;
 GTEQ               :'>=' ;
 LT                 :'<' ;
 GT                 :'>' ;

 //Different statements will choose between these, but they are pretty much the 
 same.
 PLUS               :'plus' ;
 ADD                :'add' ;
 MINUS              :'minus' ;
 SUBTRACT           :'sub' ;
 TIMES              :'times' ;
 MULT               :'multiply' ;

 DIV                :'divide' ; 
 MOD                :'mod' ;
 POW                :'pow' ;

 NOT                :'not' ;
 TRUE               :'true' ;
 FALSE              :'false' ;
 NIX                :'nix' ;
 IF                 :'if' ;
 THEN               :'then' ;
 ELSE               :'else' ;
 WHILE              :'while' ;
 DISPLAY            :'display' ;

 ARRAY              : '['(INT|FLOAT)(','(INT|FLOAT))+']';
 ID                 : [a-z]+ ;
 WORD               : LETTER+ ;

 //NUMBER               : INT | FLOAT ;

 INT                : [0-9]+ ; 

 FLOAT              : [0-9]+ '.' [0-9]*
                    | '.'[0-9]+ 
                    ;

 COMMENT            : '#' ~[\r\n]* -> channel(HIDDEN) ;
 WS                 : [ \n\t\r]+ -> channel(HIDDEN) ;
 STRING             : '"' (~["{}])+ '"' ;

When given the input let foo be 5 times 3, the visitor sees let foo be 5 and an extraneous times 3. I thought I set up the expr rule so that it would recognize a multiplication expression before it recognizes atoms, so this wouldn't happen. I don't know where I went wrong, but it does not work how I expected.

If anyone has any idea where I went wrong or how I can fix this problem, I would appreciate your input.

Bart Kiers Bart Kiers · Accepted Answer · 2018-12-24T16:29:46

You're using TIMES in your parser rules, but the MATHOP also matches TIMES and since MATHOP is defined before your TIMES rule, it gets precedence. That is why the TIMES rule in expr op=(TIMES|DIV|MOD) expr isn't matched.

I don't see you using this MATHOP rule anywhere in your parser rules, so I recommend just removing the MATHOP rule all together.

ANTLR4 Unexpected Parse Behavior

1 Answers