I am trying to do basic ANTLR-based scanning. I have a problem with a lexer not matching wanted tokens.
lexer grammar DefaultLexer;
ALPHANUM : (LETTER | DIGIT)+;
ACRONYM : LETTER '.' (LETTER '.')+;
HOST : ALPHANUM (('.' | '-') ALPHANUM)+;
fragment
LETTER : UNICODE_CLASS_LL | UNICODE_CLASS_LM | UNICODE_CLASS_LO | UNICODE_CLASS_LT | UNICODE_CLASS_LU;
fragment
DIGIT : UNICODE_CLASS_ND | UNICODE_CLASS_NL;
For the grammar above, hello. world
string given as an input results in world
only. Whereas I would expect to get both hello
and world
. What am I missing? Thanks.
ADDED:
Ok, I learned that input hello. world
matches more characters using rule HOST than ALPHANUM, therefore lexer will choose to use it. Then, when it fails to match input to the HOST rule, it does not "look back" to , because that's how lexer works.
How I get around it?