0
votes

In ANTLR4 I have the following grammar:

ID : [_a-zA-Z][0-9_a-zA-Z]*;
INT_LITERAL : [0-9]+;
FLOAT_LITERAL :[0-9]+'.'?[0-9]*([eE][-+]?)?[0-9]+;

When parsing the string 123abc, I'm expecting an error but instead I get the tokens:

123
abc
<EOF>

I've tried to add EOF at the end of my int and float literal regex,

INT_LITERAL : [0-9]+EOF;
FLOAT_LITERAL :[0-9]+'.'?[0-9]*([eE][-+]?)?[0-9]+EOF;

but even then I still get some partial parsing result

bc
<EOF>

What should I modify in order to make my grammar not accept the string 123abc?

1
Try using word boundaries \b like \b[_a-zA-Z][0-9_a-zA-Z]*\b or anchors ^ and $ - The fourth bird
@Thefourthbird those anchors are not valid in ANTLR unfortunately - Nanoboss

1 Answers

2
votes

Your lexer produces the correct result.

This type of errors should be handled in a parser, not a lexer. Do you have a parser rule that accepts INT_LITERAL followed by ID? I guess you don't. Let the parser do its job. If the rule is missing, the error you're expecting will be thrown, but only at the parsing phase, not lexical analysis.