In my grammar I have these lexer rules:
DECIMAL_NUMBER: DIGITS? DOT_SYMBOL DIGITS;
// Identifiers might start with a digit, even though it is discouraged.
IDENTIFIER: LETTER_WHEN_UNQUOTED+;
fragment LETTER_WHEN_UNQUOTED:
'0'..'9'
| 'A'..'Z' // Only upper case, as we use a case insensitive parser (insensitive only for ASCII).
| '$'
| '_'
| '\u0080'..'\uffff'
;
WHITESPACE: ( ' ' | '\t' | '\f' | '\r'| '\n') { $channel = HIDDEN; };
and this parser rule:
qualified_identifier: IDENTIFIER '.' IDENTIFIER;
This works nicely except for one special case, like this:
... a.0b
The problem here is that .0 is captured by the DECIMAL_NUMBER rule, but I'd need to ignore it if there are non-digit chars directly following any digits. How can this be done?
I was thinking about a validating predicate, but that would completely break parsing if the DECIMAL_NUMBER rule does not match it. Another thought I have was to add an action checking for any char following what has been matched so far and then manually generate tokens, which seems very ugly.
Is it possible to mark the position after the dot and return to it in the input stream when my action code determines this is not a decimal number?
1..5
as NUMBER INTERVAL NUMBER. Another options would be to call "set_type(IDENTIFIER)" inside DECIMAL_NUMBER rule's action. If digits are followed by some characters. – ibre5041.0a
an identifier, which is wrong. The correct way is to generate a DOT IDENTIFIER token sequence instead. – Mike Lischke