I have a problem figuring out how to parse a date in my grammar.
The thing is that it shares its definition with a String, but according to the Antlr 4 documentation, it should follow the precedence by looking at the order of declaration.
Here is my grammar:
grammar formula;
/* entry point */
parse: expr EOF;
expr
: value # argumentArithmeticExpr
| l=expr operator=('*'|'/'|'%') r=expr # multdivArithmeticExpr // TODO: test the % operator
| l=expr operator=('+'|'-') r=expr # addsubtArithmeticExpr
| '-' expr # minusArithmeticExpr
| FUNCTION_NAME '(' (expr ( ',' expr )* ) ? ')'# functionExpr
| '(' expr ')' # parensArithmeticExpr
;
value
: number
| variable
| date
| string
| bool;
/* Atomes */
bool
: BOOL
;
variable
: '[' (~(']') | ' ')* ']'
;
date
: DQUOTE date_format DQUOTE
| QUOTE date_format QUOTE
;
date_format
: year=INT '-' month=INT '-' day=INT (hour=INT ':' minutes=INT ':' seconds=INT)?
;
string
: STRING_LITERAL
;
number
: ('+'|'-')? NUMERIC_LITERAL
;
/* lexemes de base */
QUOTE : '\'';
DQUOTE : '"';
MINUS : '-';
COLON : ':';
DOT : '.';
PIPE : '|';
BOOL : T R U E | F A L S E;
FUNCTION_NAME: IDENTIFIER ;
IDENTIFIER
: [a-zA-Z_] [a-zA-Z_0-9]* // TODO: do we more chars in this set?
;
NUMERIC_LITERAL
: DIGIT+ ( '.' DIGIT* )? ( E [-+]? DIGIT+ )? // ex: 0.05e3
| '.' DIGIT+ ( E [-+]? DIGIT+ )? // ex: .05e3
;
INT: DIGIT+;
STRING_LITERAL
: '\'' ( ~'\'' | '\'\'' )* '\''
| '"' ( ~'"' | '""' )* '"'
;
WS: [ \t\n]+ -> skip;
UNEXPECTED_CHAR: . ;
fragment DIGIT: [0-9];
fragment A:('a'|'A');
fragment B:('b'|'B');
fragment C:('c'|'C');
fragment D:('d'|'D');
fragment E:('e'|'E');
fragment F:('f'|'F');
fragment G:('g'|'G');
fragment H:('h'|'H');
fragment I:('i'|'I');
fragment J:('j'|'J');
fragment K:('k'|'K');
fragment L:('l'|'L');
fragment M:('m'|'M');
fragment N:('n'|'N');
fragment O:('o'|'O');
fragment P:('p'|'P');
fragment Q:('q'|'Q');
fragment R:('r'|'R');
fragment S:('s'|'S');
fragment T:('t'|'T');
fragment U:('u'|'U');
fragment V:('v'|'V');
fragment W:('w'|'W');
fragment X:('x'|'X');
fragment Y:('y'|'Y');
fragment Z:('z'|'Z');
The important part here is this:
value
: number
| variable
| date
| string
| bool;
date
: DQUOTE date_format DQUOTE
| QUOTE date_format QUOTE
;
date_format
: year=INT '-' month=INT '-' day=INT (hour=INT ':' minutes=INT ':' seconds=INT)?
;
My grammar expects these things:
"a quoted string"
-> gives astring
"2015-03 TOTOTo"
-> gives astring
because the date format doesn't match."2015-03-15"
-> gives adate
because it matchesDQUOTE INT '-' INT '-' INT DQUOTE
And I (tried?) to make sure that the parser tries to match a date before trying to match a string: value: ...| date | string| ...
.
But when I use the grun
utility (and my unit tests...), I can see that it categorizes the date as a string, like if it never bothered to check the date format.
Can you tell me why it is so? I suspect there's a catch with the order in which I declare my grammar rules, but I tried some permutations and didn't get anything.