0
votes

I'm just getting started with using ANTLR. I'm trying to write a parser for field definitions that look like:

field_name = value

Example:

is_true_true = yes;

My grammar looks like this:


grammar Hello;

    //Lexer Rules
    
    fragment LOWERCASE  : [a-z] ;
    fragment UPPERCASE  : [A-Z] ;
    fragment DIGIT: '0'..'9';
    fragment TRUE: 'TRUE'|'true';
    fragment FALSE: 'FALSE'|'false';
    
    
    INTEGER : DIGIT+ ;
    STRING : ('\''.*?'\'') ;
    BOOLEAN : (TRUE|FALSE);
    
    
    WORD                : (LOWERCASE | UPPERCASE | '_')+ ;
    WHITESPACE          : (' ' | '\t')+ ;
    NEWLINE             : ('\r'? '\n' | '\r')+ ;
    
    field_def : WORD '=' WORD ';' ;
    

But when I run the generated Parser on 'working = yes;' i get the error message:

line 1:7 extraneous input ' ' expecting '='

line 1:9 extraneous input ' ' expecting WORD


I do not understand this fully, is there an error in matching the WORD-pattern or is it something else entirely?

2
This definitely looks like a higher-level API than mere regular expressions, so I'm removing the tag. - Nissa
it looks like you aren't accounting for the whitespace in field def. - Daniel A. White
@DanielA.White Thanks! That was the error. - Jakob Sachs

2 Answers

1
votes

Since it's quite usual that the whitespace is not significant to your grammar (i.e. there's no semantic meaning to it, apart of separating words), ANTLR makes it possible to just skip it:

In ANTLR 4 this is done by

WHITESPACE          : (' ' | '\t')+  -> skip;
NEWLINE             : ('\r'? '\n' | '\r')+ -> skip;

In ANTLR 3 the syntax is

WHITESPACE          : (' ' | '\t')+ { $channel = HIDDEN; };
NEWLINE             : ('\r'? '\n' | '\r')+ { $channel = HIDDEN; };

What this does is the lexer tokenizes the input as usual, but parser understands that these tokens are not significant to it and behaves as if they were not there, allowing you to keep your rules simple and without need to add optional whitespace everywhere.

1
votes

Your example has whitespace but your field_def isn't accounting for it.