I have to parse with antlr4 a text file made up of many Data Blocks, each Data Block have a Data Block Header (one line) and several DataRows, (1..*) lines.
The Data Block Header always stars with '1' located at the first position of the line, followed by several alphanumeric fields.
DataRow is also composed of alphanumeric fields (dataFields), Character '1' can be the first dataField but never located at the fist position of the line.
This is a sample of the input to parse:
1 DataHeaderField1 datafield2 DataBlock1
DB1_Row1_F1 DB1_Row1_F2 DB1_Row1_F3 DataBlock1
DB1_Row2_F1 DB1_Row2_F2 DB1_Row2_F3 DataBlock1
1 DataHeaderField1 datafield2 DataBlock2
DB2_Row1_F1 DB2_Row1_F2 DB2_Row1_F3 DataBlock2
DB2_Row2_F1 DB2_Row2_F2 DB2_Row2_F3 DataBlock2
DB2_Row3_F1 DB2_Row3_F2 DB2_Row3_F3 DataBlock2
....
The grammar I tried is:
grammar ReadDataBlocks;
start_parsing: dataBlock+ EOF;
dataBlock: commonHeader row+;
commonHeader: ONE_AT_FIRST_POS APLHANUMERIC* NL ;
row: APLHANUMERIC+ NL;
ONE_AT_FIRST_POS: '1' {getCharPositionInLine() == 1}?;
APLHANUMERIC : (LETTER
|
DIGIT)+;
DIGIT: [0-9];
LETTER: [a-zA-Z];
NL: '\r'? '\n';
ESPACES : [ \t]+ -> skip;
To parse the file I have deactivated tokens in the lexer as shown in my grammar, by specifying token ONE_AT_FIRST before DIGIT token, so at any time '1' is detected at first postion shall not be parsed as DIGIT.
The problem is that when the parser runs through a '1' located in any other position still identifies as ONE_AT_FIRST_POS throwing the following message: