ANTLR 4 extraneous input matching non lexer item

Question

I have a grammar like this :

grammar MyGrammar;

field  : f1 (STROKE f2 f3)? ;

f1 : FIELDTEXT+ ;
f2 : 'A' ;
f3 : NUMBER4 ; 

FIELDTEXT    : ~['/'] ;
NUMBER4  : [0-9][0-9][0-9][0-9];
STROKE : '/' ;

This works well enough, and fields f1 f2 f3 are all populated correctly.

Except when there is an A to the left of the /, (regardless of the presence of the optional part) this additionally causes an error:

extraneous input 'A' expecting {<EOF>, FIELDTEXT, '/'}

Some sample Data:

PHOEN

-> OK.

KLM405/A4046

-> OK.

SAW502A

-> Not OK, 'A' is in f1.

BAW617/A5136

-> Not OK, 'A' is in f1.

I am not understanding why 'A' is a problem here (the fields are still populated).

BernardK BernardK · Accepted Answer · 2017-10-21T21:15:46

The problem with SAW502A is that 'A' is a separate token, implicitly defined :

f2 : 'A' ;

(it would be the same if it were explicitly defined) :

[@16,19:19='S',<FIELDTEXT>,3:0]
[@17,20:20='A',<'A'>,3:1]
[@18,21:21='W',<FIELDTEXT>,3:2]
[@19,22:22='5',<FIELDTEXT>,3:3]
[@20,23:23='0',<FIELDTEXT>,3:4]
[@21,24:24='2',<FIELDTEXT>,3:5]
[@22,25:25='A',<'A'>,3:6]
[@23,26:26='\n',<FIELDTEXT>,3:7]

and the rule f1 does not allow anything else than FIELDTEXT. It works with :

f1 : ( FIELDTEXT | 'A' )+ ;

File Question.g4 :

grammar Question;

question
@init {System.out.println("Question last update 2305");}
    : line+ EOF
    ;
line
    : f1 (STROKE f2 f3)? NL
      {System.out.println("f1=" + $f1.text + " f2=" + $f2.text + " f3=" + $f3.text);}
    ;

f1 : ( FIELDTEXT | 'A' )+ ;
f2 : 'A' ;
f3 : NUMBER4 ; 

NUMBER4   : [0-9][0-9][0-9][0-9] ;
STROKE    : '/' ;
NL        : [\r\n]+ ; // -> channel(HIDDEN) ;
WS        : [ \t]+ -> skip ;
FIELDTEXT : ~[/] ;

Input file t.text :

PHOEN
KLM405/A4046
SAW502A
BAW617/A5136

Execution :

$ grun Question question -tokens -diagnostics t.text
[@0,0:0='P',<FIELDTEXT>,1:0]
[@1,1:1='H',<FIELDTEXT>,1:1]
[@2,2:2='O',<FIELDTEXT>,1:2]
[@3,3:3='E',<FIELDTEXT>,1:3]
[@4,4:4='N',<FIELDTEXT>,1:4]
[@5,5:5='\n',<NL>,1:5]
[@6,6:6='K',<FIELDTEXT>,2:0]
[@7,7:7='L',<FIELDTEXT>,2:1]
[@8,8:8='M',<FIELDTEXT>,2:2]
[@9,9:9='4',<FIELDTEXT>,2:3]
[@10,10:10='0',<FIELDTEXT>,2:4]
[@11,11:11='5',<FIELDTEXT>,2:5]
[@12,12:12='/',<'/'>,2:6]
[@13,13:13='A',<'A'>,2:7]
[@14,14:17='4046',<NUMBER4>,2:8]
[@15,18:18='\n',<NL>,2:12]
[@16,19:19='S',<FIELDTEXT>,3:0]
[@17,20:20='A',<'A'>,3:1]
[@18,21:21='W',<FIELDTEXT>,3:2]
[@19,22:22='5',<FIELDTEXT>,3:3]
[@20,23:23='0',<FIELDTEXT>,3:4]
[@21,24:24='2',<FIELDTEXT>,3:5]
[@22,25:25='A',<'A'>,3:6]
[@23,26:26='\n',<NL>,3:7]
[@24,27:27='B',<FIELDTEXT>,4:0]
[@25,28:28='A',<'A'>,4:1]
[@26,29:29='W',<FIELDTEXT>,4:2]
[@27,30:30='6',<FIELDTEXT>,4:3]
[@28,31:31='1',<FIELDTEXT>,4:4]
[@29,32:32='7',<FIELDTEXT>,4:5]
[@30,33:33='/',<'/'>,4:6]
[@31,34:34='A',<'A'>,4:7]
[@32,35:38='5136',<NUMBER4>,4:8]
[@33,39:39='\n',<NL>,4:12]
[@34,40:39='<EOF>',<EOF>,5:0]
Question last update 2305
f1=PHOEN f2=null f3=null
f1=KLM405 f2=A f3=4046
f1=SAW502A f2=null f3=null
f1=BAW617 f2=A f3=5136

ANTLR 4 extraneous input matching non lexer item

3 Answers