I have these lexer rules in my ANTLR3 grammar:
INTEGER: DIGITS;
FLOAT: DIGITS? DOT_SYMBOL DIGITS ('E' (MINUS_OPERATOR | PLUS_OPERATOR)? DIGITS)?;
HEXNUMBER: '0X' HEXDIGIT+;
HEXSTRING: 'X' '\'' HEXDIGIT+ '\'';
BITNUMBER: '0B' ('0' | '1')+;
BITSTRING: 'B' '\'' ('0' | '1')+ '\'';
NCHAR_TEXT: 'N' SINGLE_QUOTED_TEXT;
IDENTIFIER: LETTER_WHEN_UNQUOTED+;
fragment LETTER_WHEN_UNQUOTED:
'0'..'9'
| 'A'..'Z' // Only upper case, as we use a case insensitive parser (insensitive only for ASCII).
| '$'
| '_'
| '\u0080'..'\uffff'
;
and
qualified_identifier:
IDENTIFIER ( options { greedy = true; }: DOT_SYMBOL IDENTIFIER)?
;
This works mostly fine except for very specific situations like the input t1.1_d
which is supposed to be parsed as 2 identifiers connected with a dot. What happens is that .1 is matched as float even though it's followed by underscore and letter(s).
It's clear where that comes from: LETTER_WHEN_UNQUOTED includes digits so '1' can be both an integer and an identifier. But the rule order should take care to resolve this to an integer, as intented (and usually does).
However, I'm perplexed the t1.1_d
input causes the float rule to kick in and would appreciate some pointers to resolve this problem. As soon as I add a space after the dot all is fine, but that is obviously not a real solution.
When I move the IDENTIFIER rule before the others I get new trouble because several other rules can no longer be matched then. Moving the FLOAT rule after the IDENTIFIER rule doesn't fix the problem either (but at least doesn't produce new problems). In this case we see the actual problem: the dot is always matched by the FLOAT rule if directly followed by a digit. What can I do to make it not match in my case?