0
votes

I am making a parser for a programming language I'm building, and am running into an issue: ANTLR seems intent in not matching variable declarations.
Here is the grammar:

// Define a grammar called simc
grammar simc;

//Parser Rules
program : statement+ ;
statement : declaration | assignment | ( expression SEMICOLON ) ;
expression          : LEFTPAREN expression RIGHTPAREN                 #parenthesisExp
                    //Math
                    | <assoc=right>  expression '^' expression        #powerExp
                    | expression (ASTERISK|SLASH) expression          #mulDivExp
                    | expression (PLUS|MINUS) expression              #addSubExp
                    //Bool Operations
                    | expression EQUALITY expression                  #equalCompExp
                    | expression NONEQUALITY expression               #notequalCompExp
                    | expression GREATERTHAN expression               #greaterCompExp
                    | expression LESSTHAN expression                  #lessCompExp
                    | expression GREATERTHANOREQUALTO expression      #greaterorequalCompExp
                    | expression LESSTHANOREQUALTO expression         #lessorequalCompExp                   
                    //Any value that isn't an expression itself
                    | value                                           #valueExp
                    ;
value : constvalue | functioncall | variable ;
functioncall : IDENTIFIER LEFTPAREN expression? ( COMMA expression )? RIGHTPAREN ;
declaration : typelabel variable EQUALS expression SEMICOLON ;
assignment : variable EQUALS expression SEMICOLON ;
constvalue : intvalue | floatvalue | stringvalue | boolvalue ;
typelabel : INTLABEL | FLOATLABEL | STRINGLABEL | BOOLLABEL ;
variable : IDENTIFIER ;
intvalue : INTVALUE ;
floatvalue : FLOATVALUE ;
stringvalue : STRINGVALUE ;
boolvalue : BOOLVALUE ;

//Lexer Rules
IDENTIFIER : [a-zA-Z][a-zA-Z0-9]* ;
LEFTPAREN : '(' ;
RIGHTPAREN : ')' ;
INTLABEL : I N T ;
FLOATLABEL : F L O A T ;
STRINGLABEL : S T R I N G ;
BOOLLABEL : B O O L ;
INTVALUE : [0-9]+ ;
FLOATVALUE : [0-9]+ ( PERIOD [0-9]+ F? | F ) ;
STRINGVALUE : QUOTE ( '\\"' | . )*? QUOTE ;
BOOLVALUE : ( T R U E ) | ( F A L S E ) ;
SEMICOLON : ';' ;
ASTERISK : '*' ;
SLASH : '/' ;
PLUS : '+' ;
MINUS : '-' ;
EQUALS : '=' ;
EQUALITY : '==' ;
NONEQUALITY : '!=' ;
GREATERTHAN : '>' ;
LESSTHAN : '<' ;
GREATERTHANOREQUALTO : '>=' ;
LESSTHANOREQUALTO : '<=' ;
COMMA : ',' ;
PERIOD : '.' ;
QUOTE : '"' ;
fragment A : [aA] ; // match either an 'a' or 'A'
fragment B : [bB] ;
fragment C : [cC] ;
fragment D : [dD] ;
fragment E : [eE] ;
fragment F : [fF] ;
fragment G : [gG] ;
fragment H : [hH] ;
fragment I : [iI] ;
fragment J : [jJ] ;
fragment K : [kK] ;
fragment L : [lL] ;
fragment M : [mM] ;
fragment N : [nN] ;
fragment O : [oO] ;
fragment P : [pP] ;
fragment Q : [qQ] ;
fragment R : [rR] ;
fragment S : [sS] ;
fragment T : [tT] ;
fragment U : [uU] ;
fragment V : [vV] ;
fragment W : [wW] ;
fragment X : [xX] ;
fragment Y : [yY] ;
fragment Z : [zZ] ;
WS : [ \r\t\n]+ -> skip ;
COMMENT : ( ( '/' '/' .*? ( '\r'|'\t'|'\n' ) ) | '/*' .*? '*/' ) -> skip ;

The grammar should, if I'm not mistaken, match the code int a = 5; as a declaration of a variable. Instead, I get an empty statement (which I don't understand how that particular event is possible,) a statement marked as incorrect containing the int text (in my testing, it only worked for valid type names) and a correct assignment. To my best understanding, declarations should be found before assignments, right? Why does it match like this, and how can I fix it?

1

1 Answers

1
votes

If you look at the tokens generated for your input, you'll see that it sees int as an IDENTIFIER, not an INTLABEL. This happens because you've defined IDENTIFIER before INTLABEL in your grammar and when multiple lexer rules can match the same amount of input, it will use the one that comes first in the grammar. Therefore you should always define your identifier rule after the keywords.