Antlr grammar not matching expected lexer rule

Question

I'm trying to match a duration string, like for 30 minutes or for 2 hours using the following rules:

durationPhrase: FOR_STR (MINUTE_DURATION | HOUR_DURATION);

MINUTE_DURATION: NONZERO_NUMBER MINUTE_STR;

HOUR_DURATION: NONZERO_NUMBER HOUR_STR;

MINUTE_STR: 'minute'('s')?;

HOUR_STR: 'hour'('s')?;

FOR_STR: 'for';

NONZERO_NUMBER: [0-9]+;

WS: (' '|[\n\t\r]) -> skip;

With the following input:

for 30 minutes

Attempting to debug/match the durationPhrase rule, I'm presented with the error:

line 1:4 mismatched input '30' expecting {MINUTE_DURATION, HOUR_DURATION}

But I can't seem to figure out what lexer rule the '30' is matching? I was under the impression the "longest" lexer rule would win, which would be the MINUTE_DURATION rule.

Is it instead matching NONZERO_NUMBER first? And if so, why?

rici rici · Accepted Answer · 2018-06-25T13:43:34

It's matching NONZERO_NUMBER because neither of the other patterns apply. If you had entered 30minutes, it would have matched MINUTE_DURATION, but as a token pattern, MINUTE_DURATION won't match the space character.

You ignore whitespace by applying -> skip to the token WS. That can only happen after WS is recognised as a token; i.e. after tokenisation. During tokenisation, whitespace characters are just characters.

If you make MINUTE_DURATION and HOUR_DURATION syntax rules rather than lexical rules, it should work as expected.

Antlr grammar not matching expected lexer rule

1 Answers