Hi I am new to antrl and have a problem that I am not able to solve during the last days:
I wanted to write a grammar that recognizes this text (in reality I want to parse something different, but for the case of this question I simplified it)
100abc
150100
200def
Here each rows starts with 3 digits, that identifiy the type of the line (header, content, trailer), than 3 characters follow, that are the payload of the line.
I thought I could recogize this with this grammar:
grammar Types;
file : header content trailer;
A : [a-z|A-Z|0-9];
NL: '\n';
header : '100' A A A NL;
content: '150' A A A NL;
trailer: '200' A A A NL;
But this does not work. When the lexer reads the "100" in the second line ("150100") it reads it into one token with 100 as the value and not as three Tokens of type A. So the parser sees a "100" token where it expects an A Token.
I am pretty sure that this happens because the Lexer wants to match the longest phrase for one Token, so it cluster together the '1','0','0'. I found no way to solve this. Putting the Rule A above the parser Rule that contains the string literal "100" did not work. And also factoring the '100' into a fragement as follows did not work.
grammar Types;
file : header content trailer;
A : [a-z|A-Z|0-9];
NL: '\n';
HUNDRED: '100';
header : HUNDRED A A A NL;
content: '150' A A A NL;
trailer: '200' A A A NL;
I also read some other posts like this:
antlr4 mixed fragments in tokens
Lexer, overlapping rule, but want the shorter match
But I did not think, that it solves my problem, or at least I don't see how that could help me.