Antlr4 mismatch input (partial match)

Question

In ANTLR4 I have the following grammar:

ID : [_a-zA-Z][0-9_a-zA-Z]*;
INT_LITERAL : [0-9]+;
FLOAT_LITERAL :[0-9]+'.'?[0-9]*([eE][-+]?)?[0-9]+;

When parsing the string 123abc, I'm expecting an error but instead I get the tokens:

123
abc
<EOF>

I've tried to add EOF at the end of my int and float literal regex,

INT_LITERAL : [0-9]+EOF;
FLOAT_LITERAL :[0-9]+'.'?[0-9]*([eE][-+]?)?[0-9]+EOF;

but even then I still get some partial parsing result

bc
<EOF>

What should I modify in order to make my grammar not accept the string 123abc?

Try using word boundaries \b like \b[_a-zA-Z][0-9_a-zA-Z]*\b or anchors ^ and $ — The fourth bird
@Thefourthbird those anchors are not valid in ANTLR unfortunately — Nanoboss

Pavel Smirnov Pavel Smirnov · Accepted Answer · 2019-07-07T15:29:57

Your lexer produces the correct result.

This type of errors should be handled in a parser, not a lexer. Do you have a parser rule that accepts INT_LITERAL followed by ID? I guess you don't. Let the parser do its job. If the rule is missing, the error you're expecting will be thrown, but only at the parsing phase, not lexical analysis.

Antlr4 mismatch input (partial match)

1 Answers