0
votes

I have the following grammar:

grammar Hello;

prog:   stat+ EOF;

stat:   DELIMITER_OPEN expr DELIMITER_CLOSE;
expr:   NOTES COMMA value=VAR_VALUE #delim_body;

VAR_VALUE:  ANBang*;
NOTES:  WS* 'notes' WS*;
COMMA:  ',';
DELIMITER_OPEN: '<<!';
DELIMITER_CLOSE:    '!>>';

fragment ANBang:    AlphaNum | Bang;
fragment AlphaNum:  [a-zA-Z0-9];
fragment Bang:  '!';

WS    : [ \t\r\n]+ -> skip ;

Parsing the following works:

<<! notes, Test !>>

and the variable value is "Test", however, the parser fails when I eliminate the space between the DELIMITER_OPEN and NOTES:

<<!notes, Test !>>

line 1:3 mismatched input 'notes' expecting NOTES

1
Since you skip whitespaces try without WS* in the NOTES rule.Mike Lischke

1 Answers

2
votes

This is yet another case of badly ordered lexer rules.

When the lexer scans for the next token, it first tries to find the rule which will match the longest token. If several rules match, it will disambiguate by choosing the first one in definition order.

<<! notes, Test !>> will be tokenized as such:
DELIMITER_OPEN NOTES COMMA VAR_VALUE WS DELIMITER_CLOSE

This is because the NOTES rule can match the following:

<<! notes, Test !>>
   \____/

Which includes the whitespace. If you remove it:

<<!notes, Test !>>

Then both the NOTES and VAR_VALUE rules can match the text notes, and, VAR_VALUE is defined first in the grammar, so it gets precedence. The tokenization is:
DELIMITER_OPEN VAR_VALUE COMMA VAR_VALUE WS DELIMITER_CLOSE
and it doesn't match your expr rule.

Change your rules like this to fix the problem:

NOTES:  'notes';
VAR_VALUE:  ANBang+;

Adding WS* to other rules doesn't make much sense, since WS is skipped. And declaring a token as having a possible zero width * is also meaningless, so use + instead. Finally, reorder the rules so that the most specific ones match fist.

This way, notes becomes a keyword in your grammar. If you don't want it to be a keyword, remove the NOTES rule altogether, and use the VAR_VALUE rule with a predicate. Alternatively, you could use lexer modes.