If I have an ANTLR grammar as follows:
grammar Test;
options {
language = Java;
}
rule : (foo | bar);
foo : FOO ',' FOO;
bar : BAR;
FOO: ('0'..'9')+;
BAR: ('a'..'z' | 'A'..'Z' | '0'..'9' | ' ')+;
WHITESPACE: (' ' | '\t')+ { $channel=HIDDEN; };
And I use a test string:
12abc3
this (I believe) is a BAR
token which satisfies a bar
rule and is parsed as such. Bravo.
However, if I have this string:
12
I receive line 1:2 mismatched input '' expecting ','
This seems rather non-deterministic although I'm sure it's not. I understand I'm already in trouble by having two tokens: FOO
and BAR
that accept digits. But if the parser is going to succeed or fail it should succeed or fail consistently. In other words, in the first case the first character is a 1 and apparently is being evaluated as a member of the BAR
token and thus the parser heads down a successful path. In the second case, the SAME first character is being evaluated as a FOO
token and thus the path is doomed to fail despite the fact that the string COULD be a successful bar
parse. Why the inconsistency? Or am I missing something more fundamental about ANTLR and/or parsing?