Say I have a piece of ANTLR grammar (lexer part)
fragment LETTER : ('a'..'z' | 'A'..'Z') ;
fragment DIGIT : '0'..'9';
INTEGER : DIGIT+ ;
Ident : LETTER (LETTER | DIGIT)*;
WS : (' ' | '\t' | '\n' | '\r' | '\f')+ {$channel = HIDDEN;};
COMMENT : '//' .* ('\n'|'\r') {$channel = HIDDEN;};
I am thinking that, since WS eats all the white spaces between token, both "x y z" and "xyz" should have been recognizied as the same token of Ident. But apparently only "x y z" will be considered as 3 Ident. So I really feel confused about the behavior when white space is encountered for a lexer rule.
More concretely, I have a rule
VARIABLE: ('A'..'Z')+ DIGIT* ;
I want it to recognize variables identities like X3, Y4, XX55, etc. But surprisingly, this rule recognizes " X Y" So this seems to be totally incomprehensible. What is your idea?