3
votes

Using ANTLR version 4.3 here.

This grammar accepts a variety of EOF-delimited expressions, one at a time. The expressions start with key and vary in syntax thereafter. Sample accepted strings:

"cycle in freerun" <EOF>
"runtime <= 20m" <EOF>
"grab enabled" <EOF>

ANTLR merrily parses all expressions into components, the listener acts on the relevant components, life is happy. Here is a representative snippet of the grammar:

expr               // <-- Start rule
  : freq_p EOF
  | cycle_p EOF
  ...              // Many more, ad nauseum
  ;

freq_p  : FREQ '=' INT | FREQ '<' INT ;

cycle_p : CYCLE IN cycles ;
cycles  : cycle (',' cycle)* ;
cycle   : PHASELOCK | FREERUN ;

// Keywords
CYCLE     : 'cycle' ;
FREERUN   : 'freerun' ;
FREQ      : 'frequency' ;
IN        : 'in' ;
PHASELOCK : 'phaselock' ;

INT       : '0'..'9'+ ;
WS        : [ \n\t\r]+ -> skip ;

But now, I need to extend the grammar to incorporate 2 new expressions, both of which end with the acceptance of any sequence of characters at all (including Unicode) through to the EOF. Sample input:

"echo = Confirm 'interlock' is clear,\n and actuate \"frequency\" button." <EOF>
"report Process complete." <EOF>

I'm having great difficulty expressing the acceptance of all input to EOF in the grammar. The following changes lead to misery:

expr
  ...
  : echo_p EOF
  : report_p EOF
  ...

echo_p   : ECHO   '=' REMAINING ; // Snarfs all remaining input until EOF
report_p : REPORT     REMAINING ; // Ditto

ECHO     : 'echo' ;
REPORT   : 'report' ;

REMAINING : <WHAT_GOES_HERE?> ;  // .* messes up everything else

How can this be accomplished? The desired outcome is that the parse tree listener will obtain the text value such as REMAINING().getText().

Abandoned Approach: Lexer Grammar utilizing mode()

I tried writing REMAINING into a separate lexer grammar and importing that from the combined grammar, but ran into https://github.com/antlr/antlr4/issues/160 and compile-time warnings. The IntelliJ ANTLR plugin also malfunctions which is a negative impact on productivity. I learned that including a lexer grammar that uses modes is unsupported, at least in ANTLR 4.3.

lexer grammar Remainder;

@lexer::members {
// Needed at least until ANTLR issue #160 is fixed.
public static final int CONSUME_ALL = 123;
}

REMAINING : . -> more, mode(CONSUME_ALL) ;

mode CONSUME_ALL;

TEXT : .* ; // Consume all remaining input
1
Is it possible to "short-circuit" ANTLR i.e. in the parse tree listener? Can the listener's enterReport_p() method snarf the remaining character stream as a String, and then instruct ANTLR to stop (simulating or skipping directly to EOF ?)and indicate success?Ty.
I'm having a similar issue. Looking for ways to inject pseudo EOF tokens into the token stream perhaps with a token rewriter. If successful I will post answer here.Ross Youngblood
Wonderful – Please do!Ty.

1 Answers

0
votes

You should make .* nongreedy by adding ? to it:

REMAINING : .*? ;

This will consume everything until it finds EOF

Have a look here: https://theantlrguy.atlassian.net/wiki/display/ANTLR4/Wildcard+Operator+and+Nongreedy+Subrules