I'd like to build a natural language date parser in ANTLR4 and got stuck on ignoring "noise" input. The simplified grammar below parses any string that contains valid dates in the format DATE MONTH:
dates
: simple_date dates
| EOF
;
simple_date
: DATE MONTH
;
DATE : [0-9][0-9]?;
MONTH : January | February | March // etc.;
Text such as "1 January 22 February" will be accepted. I wanted the grammar accept other text as well, so I added ANY : . -> skip;
at the end:
dates
: simple_date dates
| EOF
;
simple_date
: DATE MONTH
;
DATE : [0-9][0-9]?;
MONTH : January | February | March // etc.;
ANY : . -> skip;
This doesn't quite do what I want, however. While string such as "On 1 January and 22 February" is accepted and the simple_date
rule is matched twice, string "On 1XX January" will also match the rule.
Question: How do I build a grammar where rules are matched only with the exact token sequence while ignoring all other input, including tokens in an order not defined in any of the rules? Consider the following cases:
"From 1 January to 2 February" -> simple_date matches "1 January" and "2 February"
"From 1XX January to 2 February" -> simple_date matches "2 February", rest is ignored
"From January to February" -> no match, everything ignored
date
todates
in the top-level rule to make it work as described. – David