A vastly simplified version of my lexer rules (within a bigger grammar) are something like the following:
fragment HEX_DIGIT : [0-9A-F] ;
fragment DIGIT : [0-9] ;
SCIENTIFIC : 'E' [+-] ;
INTEGER : DIGIT+ ;
HEX_INTEGER : HEX_DIGIT+ ;
FLOAT_ZERO : '0'* '.' '0'+ ;
FLOAT : DIGIT* '.' DIGIT+ ;
The problem here comes with input such as 00E+00. The tokens I would like out of this are '00', 'E+', '00'. However, Antlr goes the greedy route and parses '00E' as a HEX_INTEGER, and in the full lexer, produces '+' and '00' tokens.
Any suggestions for handling this special case in the lexer? _input.LA() tricks don't seem to work as we are operating at the character level, so I'm not always sure how far I have to look ahead to look for the special 'E+' sequence at the end of the hex number.