ANTLR4: matching token with same rule but with different position in the grammar

Question

I have the following statement I wish to parse:

in(name,(Silver,Gold))

in: is a function.
name: is a ID.
(Silver, Gold): is string array with elements 'Silver', and 'Gold'.

The parser is always confused as ID and string array elements have the same rule. Using quotes or double quotes for string will help, but this is not the case here.

Also, predicates didn't help much.

The grammar:

grammar Rql;

statement
 : EOF
 | query EOF
 ;

query
 : function
 ;

function
 : FUNCTION_IN OPAR id COMMA OPAR array CPAR CPAR
 ;

array
 : VALUE (COMMA VALUE)*
 ;

FUNCTION_IN: 'in';

id
 : {in(}? ID
 ;

ID
 : [a-zA-Z_] [a-zA-Z_0-9]*
 ;

VALUE
 : STRING
 | INT
 | FLOAT
 ;

OPAR : '(';
CPAR : ')';
COMMA : ',';

INT
 : [0-9]+
 ;

FLOAT
 : [0-9]+ '.' [0-9]*
 | '.' [0-9]+
 ;

SPACE
 : [ \t\r\n] -> skip
 ;

STRING
 :  [a-zA-Z_] [a-zA-Z_0-9]*
 ;

OTHER
 : .
 ;

It's not the parser that is confused, believe me. It always knows what to parse :-) So, what's the actual problem? What do you expect? — Mike Lischke
Is your input Silver,Gold correct ? Isn't it 'Silver', 'Gold' ? Without apostrophes, there is no chance they are matched by the STRING rule. — BernardK
@BernardK: Yes. The grammar works with strings with quotes 'Silver' and 'Gold'. If I removed the quotes, the parser will think it's an ID. — Amr Ellafy
Of course it's an ID -> LIT. Every piece of input starting with a letter is a LIT. I'm not sure if it is a good practice that a lexer rule calls other lexer rules. Why don't you make value a parser rule calling LIT instead of STRING ? — BernardK

BernardK BernardK · Accepted Answer · 2017-09-27T12:12:30

The idea is to change the type of the token under some condition. Here seeing an ID for the first time in a line sets a switch to true. The next time an ID is matched, the lexer will execute the if and set the type to ID_VALUE. I wanted to reset the switch while entering the rule function, but it doesn't work :

function
@init {QuestionLexer.id_seen = false; System.out.println("id_seen has been reset" + QuestionLexer.id_seen);}
 : FUNCTION_IN OPAR ID COMMA OPAR array CPAR CPAR

ID=name1  seen ? false
ID=Silver  seen ? true
...
ID=Platinum  seen ? true
[@0,0:1='in',<'in'>,1:0]
[@1,2:2='(',<'('>,1:2]
[@2,3:7='name1',<ID>,1:3]
[@3,8:8=',',<','>,1:8]
[@4,9:9='(',<'('>,1:9]
[@5,10:15='Silver',<10>,1:10]
...
[@12,27:31='name2',<10>,2:3]
...
[@20,52:51='<EOF>',<EOF>,3:0]
Question last update 1336
id_seen has been reset false
id_seen has been reset false
line 2:3 mismatched input 'name2' expecting ID

.

That's why I reset it in the FUNCTION_IN rule.

Grammar Question.g4 :

grammar Question;

@lexer::members {
    static boolean id_seen = false;
}

tokens { ID_VALUE }

question
@init {System.out.println("Question last update 1352");}
 : function+ EOF
 ;

function
 : FUNCTION_IN OPAR ID COMMA OPAR array CPAR CPAR
 ;

array
 : value (COMMA value)*
 ;

value
 : ID_VALUE
 | INT
 | FLOAT
 ;

FUNCTION_IN: 'in' {id_seen = false;} ;

ID : [a-zA-Z_] [a-zA-Z_0-9]*
     {System.out.println("ID=" + getText() + "  seen ? " + id_seen);
      if (id_seen) setType(QuestionParser.ID_VALUE); id_seen = true; } ;

OPAR : '(';
CPAR : ')';
COMMA : ',';

INT
 : [0-9]+
 ;

FLOAT
 : [0-9]+ '.' [0-9]*
 | '.' [0-9]+
 ;

SPACE
 : [ \t\r\n] -> skip
 ;

OTHER
 : .
 ;

File t.text :

in(name1,(Silver,Gold))
in(name2,(Copper,Platinum))

Execution with ANTLR 4.6 :

$ grun Question question -tokens -diagnostics t.text
ID=name1  seen ? false
ID=Silver  seen ? true
ID=Gold  seen ? true
ID=name2  seen ? false
ID=Copper  seen ? true
ID=Platinum  seen ? true
[@0,0:1='in',<'in'>,1:0]
[@1,2:2='(',<'('>,1:2]
[@2,3:7='name1',<ID>,1:3]
[@3,8:8=',',<','>,1:8]
[@4,9:9='(',<'('>,1:9]
[@5,10:15='Silver',<10>,1:10]
[@6,16:16=',',<','>,1:16]
[@7,17:20='Gold',<10>,1:17]
[@8,21:21=')',<')'>,1:21]
[@9,22:22=')',<')'>,1:22]
[@10,24:25='in',<'in'>,2:0]
[@11,26:26='(',<'('>,2:2]
[@12,27:31='name2',<ID>,2:3]
[@13,32:32=',',<','>,2:8]
[@14,33:33='(',<'('>,2:9]
[@15,34:39='Copper',<10>,2:10]
[@16,40:40=',',<','>,2:16]
[@17,41:48='Platinum',<10>,2:17]
[@18,49:49=')',<')'>,2:25]
[@19,50:50=')',<')'>,2:26]
[@20,52:51='<EOF>',<EOF>,3:0]
Question last update 1352

Type <10> is ID_VALUE as can be seen in the .tokens file

$ cat Question.tokens 
FUNCTION_IN=1
...
OTHER=9
ID_VALUE=10
'in'=1

ANTLR4: matching token with same rule but with different position in the grammar

1 Answers