1
votes

I have the following ANTLR grammar

relation
  : IDENTIFIER EQUAL relative_date
; 
relative_date
 : K_NOW (PLUS | MINUS) NUMERIC_LITERAL TIME_UNIT
;

IDENTIFIER 
 : //'"' (~'"' | '""')* '"'
 '`' (~'`' | '``')* '`'
 | '[' ~']'* ']'
 | [a-zA-Z_] [a-zA-Z_.0-9]* 
;

TIME_UNIT
 : ('h'|'m'|'s'|'d'|'w'|'M'|'y'|'q')
;

PLUS : '+';
MINUS : '-';
EQUAL: '=';
K_NOW : N O W;
NUMERIC_LITERAL
 : [0-9]+ ;

If I put TIME_UNIT before IDENTIFIER parser

  • something = now - 5d works
  • d = now - 5d DOES NOT work and fails at first d and says IDENTIFIER required

If I put TIME_UNIT after IDENTIFIER parser

  • something = now - 5d fails at the second d and says TIME_UNIT required
  • d = now - 5d fails at the second d and says TIME_UNIT required

Can someone help me how can I change the grammar to work in both cases? Like when it is a relative date use TIME_UNIT lexer otherwise IDENTIFIER lexer

2

2 Answers

3
votes

ANTLR's lexer tries to match as much characters as possible. When 2 or more lexer rules match the same amount of characters, the rule defined first "wins".

So, the input d matches both TIME_UNIT and IDENTIFIER, but because IDENTIFIER is defined first, it wins. In other words: the rule TIME_UNIT will never be matched.

The solution, put TIME_UNIT before IDENTIFIER:

TIME_UNIT
 : ('h'|'m'|'s'|'d'|'w'|'M'|'y'|'q')
 ;

K_NOW
 : N O W
 ;

IDENTIFIER 
 : //'"' (~'"' | '""')* '"'
   '`' (~'`' | '``')* '`'
 | '[' ~']'* ']'
 | [a-zA-Z_] [a-zA-Z_.0-9]* 
 ;

(Note that K_NOW will also need to be placed before IDENTIFIER!)

However, now the input d, h, m, etc. will never become an IDENTIFIER because these will now always become a TIME_UNIT. You cannot change this, that is how ANTLR's lexer works. You can handle this in the parser like this:

identifier
 : IDENTIFIER
 | TIME_UNIT
 ;

TIME_UNIT
 : ('h'|'m'|'s'|'d'|'w'|'M'|'y'|'q')
 ;

IDENTIFIER 
 : //'"' (~'"' | '""')* '"'
   '`' (~'`' | '``')* '`'
 | '[' ~']'* ']'
 | [a-zA-Z_] [a-zA-Z_.0-9]* 
 ;

and then use the rule identifier in your parser rules instead of IDENTIFIER:

relation
 : identifier EQUAL relative_date
 ;
-1
votes

You could change NUMERIC_LITERAL TIME_UNIT into one lexer rule DURATION and parse the duration yourself

relative_date
 : K_NOW (PLUS | MINUS) DURATION
;

DURATION
 : [0-9]+ SPACE* ('h'|'m'|'s'|'d'|'w'|'M'|'y'|'q')
;

SPACE
 : [ \u000B\t\r\n]
;