Lexer rule terminates without matching terminator symbol

Question

fragment EXEC : ('E' 'X' 'E' 'C');

fragment CMD : ('C' 'M' 'D');

fragment BEGIN : ('B' 'E' 'G' 'I' 'N');

fragment END : ('E' 'N' 'D');

fragment SEMICOLON : ';';

ExecCommand : EXEC Whitespace CMD Whitespace BEGIN WhiteSpace? SEMICOLON ( options {greedy=false;} : . )* END Whitespace EXEC;

Begin : BEGIN;

End : END;

Exec : EXEC;

The ExecCommand rule terminates the scan on the first 'E' following the Semicolon and then fails if the next characters are not 'END'. The scan should only terminate on 'END' and not for 'ELSE' or any other string beginning with 'E'.

The scan loop has a test for _LA(1) == 'E' instead of match('END').

Exec, Begin and End are also token rules. The ExecCommand rule is the first rule in the lexer grammar so it should have precedence.

How do I generate a rule that will accept any arbitrary text between the start and end symbols and not terminate until the end symbol is found?

I tried the following and it did not generate successfully: ExecCommand : EXEC Whitespace CMD Whitespace BEGIN WhiteSpace? SEMICOLON ( options {greedy=false;} : ~(END Whitespace EXEC) )* END Whitespace EXEC;

Mike Lischke Mike Lischke · Accepted Answer · 2014-02-10T07:46:56

By default there's only one lookahead, so a non-greedy loop terminates as soon as the first input char after that loop is found (the 'E'). It then attempts to match 'N' and 'D' (as directed by the END rule), which fails in your case. Try to increase the lookahead (paramater k) and see if that helps. If not you have to find a different approach.

As a side note: Exec and EXEC are entirely the same (except for their names), so either remove EXEC (if you need that lexer token in your parser) or Exec (if that lexer token is exclusively used in your lexer. Similar for END and BEGIN.

Lexer rule terminates without matching terminator symbol

1 Answers