ANTLR grammar not working as expected. What am I doing wrong?

Question

I have this grammar below for implementing an IN operator taking a list of numbers or strings.

grammar listFilterExpr;

listFilterExpr: entityIdNumberListFilter | entityIdStringListFilter;

entityIdNumberProperty
    : 'a.Id'
    | 'c.Id'
    | 'e.Id'
    ;
    
entityIdStringProperty
    : 'f.phone'
    ;

listFilterExpr
    : entityIdNumberListFilter
    | entityIdStringListFilter
    ;

listOperator
    : '$in:'
    ;

entityIdNumberListFilter
 :  entityIdNumberProperty listOperator numberList
 ;

 entityIdStringListFilter
 : entityIdStringProperty listOperator stringList
 ;

 numberList: '[' ID (',' ID)* ']';

 fragment ID: [1-9][0-9]*;

 stringList: '[' STRING (',' STRING)* ']';
 
 STRING
: '"'(ESC | SAFECODEPOINT)*'"'
;

fragment ESC
   : '\\' (["\\/bfnrt] | UNICODE)
   ;
   
fragment SAFECODEPOINT
   : ~ ["\\\u0000-\u001F]
   ;

If I try to parse the following input:

c.Id $in: [1,1]

Then I get the following error in the parser:

mismatched input '1' expecting ID

Please help me to correct this grammar.

Update

I found this following rule way above in the huge grammar file of my project that might be matching '1' before it gets to match to ID:

NUMBER
   : '-'? INT ('.' [0-9] +)?
   ;
fragment INT
   : '0' | [1-9] [0-9]*
   ;

But, If I write my ID rule before NUMBER then other things fail, because they have already matched ID which should have matched NUMBER

What should I do?

@teenup then you probably did not re-generate the lexer and parser, because it works when fragment is removed. Another option is that you removed too much of the rules you just posted, and you have some conflicting lexer rules you didn't pot in your original question. Always post a self-contained example so that other see what you see. — Bart Kiers

Bart Kiers Bart Kiers · Accepted Answer · 2020-10-13T11:54:57

As mentioned by rici: ID should not be a fragment. Fragments can only be used by other lexer rules, they will never become a token on their own (and can therefor not be used in parser rules).

Just remove the fragment keyword from it: ID: [1-9][0-9]*;

Note that you'll also have to account for spaces. You probably want to skip them:

SPACES : [ \t\r\n] -> skip;

... mismatched input '1' expecting ID ...

This looks like there's another lexer, besides ID, that also matches the input 1 and is defined before ID. In that case, have a look at this Q&A: ANTLR 4.5 - Mismatched Input 'x' expecting 'x'

EDIT

Because you have the rules ordered like this:

NUMBER
   : '-'? INT ('.' [0-9] +)?
   ;

fragment INT
   : '0' | [1-9] [0-9]*
   ;

ID
   : [1-9][0-9]*
   ;

the lexer will never create an ID token (only NUMBER tokens will be created). This is just how ANTLR works: in case of 2 or more lexer rules match the same amount of characters, the one defined first "wins".

In the first place I think it's odd to have an ID rule that matches only digits, but, if that's the language you're parsing, OK. In your case, you could do something like this:

id     : POS_NUMBER;
number : POS_NUMBER | NEG_NUMBER;

POS_NUMBER : INT ('.' [0-9] +)?;
NEG_NUMBER : '-' POS_NUMBER;

fragment INT
   : '0' | [1-9] [0-9]*
   ;

and then instead of ID, use id in your parser rules. As well as using number instead of the NUMBER you're using now.

ANTLR grammar not working as expected. What am I doing wrong?

1 Answers

EDIT