1
votes

I am trying to port an existing grammar developed for an unknown tool to Antlr. There is a use case in the grammar where there are two tokens such as TEXT and TEXT_WITHOUT_A Some rules in the grammar should allow only text without a, but the rest is OK with using text.

My initial attempts produced the following grammar, but the problem is, Antlr matches the more specific grammar rule (txtwa) when txt is actually a superset of it. If I enter something like 'sometextwth' that does not contain a, Antlr does not follow the rule for text (txt) The expected input is txt, and the provided input matches is, but Antlr figures out that the input matches txtwa and even if it is not expected at that point in grammar, chooses not to use txt.

    /*------------------------------------------------------------------
 * PARSER RULES
 *------------------------------------------------------------------*/
 expr   :   (  txt)* ;
 txt    :   TEXT ;
 txtwa  :   LETTERS_MINUS_A;
 term   :   factor ( (MULT | DIV) factor)*;
 factor :   NUMBER;

/*------------------------------------------------------------------
 * LEXER RULES
 *------------------------------------------------------------------*/


NUMBER              :   (DIGIT)+ ;

WHITESPACE      :   ( '\t' | ' ' | '\r' | '\n' | '\u000C')+ {$channel = HIDDEN;} ;

fragment LETTER_MINUS_A :   ('b'..'z' | 'B'..'Z');

fragment LETTER :   ('a'..'z' | 'A'..'Z');


fragment DIGIT      :   '0'..'9' ;   



LETTERS_MINUS_A 
    :   LETTER_MINUS_A (LETTER_MINUS_A)*;       

TEXT    :   LETTER (LETTER)* ;

I'd like to use txt freely without having to do (txt | txtwa) , which works btw. What am I missing here?

1

1 Answers

2
votes

You must realize that the lexer does not take into account what the parser needs at a particular time: it simply tries to construct a token going through the lexer rules from top to bottom.

Because you defined LETTERS_MINUS_A before TEXT, LETTERS_MINUS_A will always be created instead of TEXT, which will only ever contain 'a''s and 'A''s.

This is simply how ANTLR works.

What you can do is simply throw away the LETTERS_MINUS_A rule and do something like this:

txt
 : TEXT 
 ;

txtwa 
 : TEXT 
   {
     if($TEXT.text.contains("a") || $TEXT.text.contains("A")) {
       throw new Exception("Eeek, I saw an `[aA]`!");
     }
   }
 ;