The following extremely simple example grammar doesn't lex as I expected (at all).
Declaration : 'VAR';
Letter: ('A'..'Z');
message : Declaration Letter+;
what I expected as a result is that any sequence of letters would lex as individual letters, and the sequence 'VAR' would be lexed as a single token.
When I look at the ANTLRWorks interperter I see the following results:
VARA
parses intomessage -> "VAR", "A"
(expected)VARVA
doesn't parse (MismatchedTokenException(-1 != 5). The lexer hits the secondVA
and tries to tokeniseDeclaration
. Expected:message -> "VAR", "V", "A"
VARVPP
parses intomessage -> "VAR", "V", "P", "P"
(expected)VARVALL
parses intomessage -> "VAR", "VALL"
.
I would like some help understanding this behaviour, and a suggestion how I can fix this.
Specifically:
- Why does the lexer try to tokenise all strings starting with
VA
into Declaration if it is followed by one letter? - Why doesn't the lexer try to do this with all strings starting with a
V
? - Why doesn't the lexer try to do this if there is an additional character there?
- How should I change this grammar to parse the way I expected?