I'm using ANTLR4 to parse text adventure game dialogue files written in Yarn, so mostly free form text and loads of island grammars, and for the most part things are going smoothly but I am having an issue excluding certain text inside the Shortcut
mode (when presenting options for the player to choose from).
Basically I need to write a rule to match anything except #
, newline or <<
. When it hits a <<
it needs to move into a new mode for handling expressions of various kinds or to just leave the current mode so that the <<
will get picked up by the already existing rules.
A cut down version of my lexer (ignoring rules for expressions):
lexer grammar YarnLexer;
NEWLINE : ('\n') -> skip;
CMD : '<<' -> pushMode(Command);
SHORTCUT : '->' -> pushMode(Shortcut);
HASHTAG : '#' ;
LINE_GOBBLE : . -> more, pushMode(Line);
mode Line;
LINE : ~('\n'|'#')* -> popMode;
mode Shortcut ;
TEXT : CHAR+ -> popMode;
fragment CHAR : ~('#'|'\n'|'<');
mode Command ;
CMD_EXIT : '>>' -> popMode;
// RULES FOR OPERATORS/IDs/NUMBERS/KEYWORDS/etc
CMD_TEXT : ~('>')+ ;
And the parser grammar (again ignoring all the rules for expressions):
parser grammar YarnParser;
options { tokenVocab=YarnLexer; }
dialogue: statement+ EOF;
statement : line_statement | shortcut_statement | command_statement ;
hashtag : HASHTAG LINE ;
line_statement : LINE hashtag? ;
shortcut_statement : SHORTCUT TEXT command_statement? hashtag?;
command_statement : CMD expression CMD_EXIT;
expression : CMD_TEXT ;
I have tested the Command mode when it is by itself and everything inside there is working fine, but when I try to parse my example input:
Where should we go?
-> the park
-> the zoo
-> Peter's house <<if $metPeter == true >>
ok shall we take the bus?
-> :<
-> ok
<<set $daySpent = true>>
my issue is the line:
-> Peter's house <<if $metPeter == true >>
gets matched completely as TEXT and the CMD rules just gets ignored in favour by far longer TEXT.
My first thought was to add <
to the set but then I can't have text like:
-> :<
which should be perfectly valid. Any idea how to do this?
<<
(and>>
) into the list of "forbidden" tokens of the ruleCHAR
? Something likefragment CHAR: ~('\n' | '#' | '<<' | '>>');
? – Raven