1
votes

I am creating parser and lexer rules for Decaf programming language written in ANTLR4. I'm trying to parse a test file and keep getting an error, there must be something wrong in the grammar but i cant figure it out.

My test file looks like:

class Program {
  int i[10];
}

The error is : line 2:8 mismatched input '10' expecting INT_LITERAL

And here is the full Decaf.g4 grammar file

grammar Decaf;


/*
  LEXER RULES
  -----------
  Lexer rules define the basic syntax of individual words and symbols of a
  valid Decaf program. Lexer rules follow regular expression syntax.
  Complete the lexer rules following the Decaf Language Specification.
*/



CLASS : 'class';

INT : 'int';

RETURN : 'return';

VOID : 'void';

IF : 'if';

ELSE : 'else';

FOR : 'for';

BREAK : 'break';

CONTINUE : 'continue';

CALLOUT : 'callout';

TRUE : 'True' ;

FALSE : 'False' ;

BOOLEAN : 'boolean';

LCURLY : '{';

RCURLY : '}';

LBRACE : '(';

RBRACE : ')';


LSQUARE : '[';

RSQUARE : ']';
ADD : '+';

SUB : '-';

MUL : '*';

DIV : '/';

EQ : '=';

SEMI : ';';

COMMA : ',';

AND : '&&';

LESS : '<';

GREATER : '>';

LESSEQUAL : '<=' ;

GREATEREQUAL : '>=' ;

EQUALTO : '==' ;

NOTEQUAL : '!=' ;

EXCLAMATION : '!';



fragment CHAR : (' '..'!') | ('#'..'&') | ('('..'[') | (']'..'~') | ('\\'[']) | ('\\"') | ('\\') | ('\t') | ('\n');

CHAR_LITERAL : '\'' CHAR '\'';

//STRING_LITERAL : '"' CHAR+ '"' ;


HEXMARK : '0x';

fragment HEXA : [a-fA-F];

fragment HEXDIGIT : DIGIT | HEXA ;

HEX_LITERAL : HEXMARK HEXDIGIT+;


STRING : '"' (ESC|.)*? '"';

fragment ESC : '\\"' | '\\\\';




fragment DIGIT : [0-9];

DECIMAL_LITERAL : DIGIT(DIGIT)*;



COMMENT : '//' ~('\n')* '\n' -> skip;

WS : (' ' | '\n' | '\t' | '\r') + -> skip;

fragment ALPHA : [a-zA-Z] | '_';

fragment ALPHA_NUM : ALPHA | DIGIT;



ID : ALPHA ALPHA_NUM*;

INT_LITERAL : DECIMAL_LITERAL | HEX_LITERAL;

BOOL_LITERAL : TRUE | FALSE;

/*
  PARSER RULES
  ------------
  Parser rules are all lower case, and make use of lexer rules defined above
  and other parser rules defined below. Parser rules also follow regular
  expression syntax. Complete the parser rules following the Decaf Language
  Specification.
*/




program : CLASS ID LCURLY field_decl* method_decl* RCURLY EOF;

field_name : ID | ID LSQUARE INT_LITERAL RSQUARE;

field_decl : datatype field_name (COMMA field_name)* SEMI;

method_decl : (datatype | VOID) ID LBRACE ((datatype ID) (COMMA datatype ID)*)? RBRACE block;

block : LCURLY var_decl* statement* RCURLY;

var_decl : datatype ID (COMMA ID)* SEMI;


datatype : INT | BOOLEAN;

statement : location assign_op expr SEMI
        | method_call SEMI
        | IF LBRACE expr RBRACE block (ELSE block)?
        | FOR ID EQ expr COMMA expr block
        | RETURN (expr)? SEMI
        | BREAK SEMI
        | CONTINUE SEMI
        | block;
        
assign_op : EQ
          | ADD EQ
          | SUB EQ;
          
          
method_call : method_name LBRACE (expr (COMMA expr)*)? RBRACE
            | CALLOUT LBRACE STRING(COMMA callout_arg (COMMA callout_arg)*) RBRACE;


method_name : ID;

location : ID | ID LSQUARE expr RSQUARE;


expr : location
     | method_call
     | literal
     | expr bin_op expr
     | SUB expr
     | EXCLAMATION expr
     | LBRACE expr RBRACE;

 callout_arg : expr
            | STRING ;

bin_op : arith_op
      | rel_op
      | eq_op
      | cond_op;


arith_op : ADD | SUB | MUL | DIV | '%' ;

rel_op : LESS | GREATER | LESSEQUAL | GREATEREQUAL ;

eq_op : EQUALTO | NOTEQUAL ;

cond_op : AND | '||' ;

literal : INT_LITERAL | CHAR_LITERAL | BOOL_LITERAL ;
1

1 Answers

0
votes

Whenever there are 2 or more lexer rules that match the same characters, the one defined first wins. In your case, these 2 rules both match 10:

DECIMAL_LITERAL : DIGIT(DIGIT)*;

INT_LITERAL : DECIMAL_LITERAL | HEX_LITERAL;

and since INT_LITERAL is defined after DECIMAL_LITERAL, the lexer will never create a INT_LITERAL token. If you now try to use it in a parser rule, you get an error message you posted.

The solution: remove INT_LITERAL from your lexer and create a parser rule instead:

int_literal : DECIMAL_LITERAL | HEX_LITERAL;

and use int_literal in your parser rules instead.