0
votes

I'm using ANTLR4 to parse text adventure game dialogue files written in Yarn, so mostly free form text and loads of island grammars, and for the most part things are going smoothly but I am having an issue excluding certain text inside the Shortcut mode (when presenting options for the player to choose from).

Basically I need to write a rule to match anything except #, newline or <<. When it hits a << it needs to move into a new mode for handling expressions of various kinds or to just leave the current mode so that the << will get picked up by the already existing rules.

A cut down version of my lexer (ignoring rules for expressions):

lexer grammar YarnLexer;

NEWLINE : ('\n') -> skip;

CMD : '<<' -> pushMode(Command);
SHORTCUT : '->' -> pushMode(Shortcut);

HASHTAG : '#' ;

LINE_GOBBLE : . -> more, pushMode(Line);

mode Line;
LINE : ~('\n'|'#')* -> popMode;

mode Shortcut ;
TEXT : CHAR+ -> popMode;
fragment CHAR : ~('#'|'\n'|'<');

mode Command ;
CMD_EXIT : '>>' -> popMode;

// RULES FOR OPERATORS/IDs/NUMBERS/KEYWORDS/etc
CMD_TEXT : ~('>')+ ;

And the parser grammar (again ignoring all the rules for expressions):

parser grammar YarnParser;

options { tokenVocab=YarnLexer; }

dialogue: statement+ EOF;

statement : line_statement | shortcut_statement | command_statement ;

hashtag : HASHTAG LINE ;

line_statement : LINE hashtag? ;

shortcut_statement : SHORTCUT TEXT command_statement? hashtag?;

command_statement : CMD expression CMD_EXIT;
expression : CMD_TEXT ;

I have tested the Command mode when it is by itself and everything inside there is working fine, but when I try to parse my example input:

Where should we go?
-> the park
-> the zoo
-> Peter's house <<if $metPeter == true >>

ok shall we take the bus?
-> :<
-> ok

<<set $daySpent = true>>

my issue is the line:

-> Peter's house <<if $metPeter == true >>

gets matched completely as TEXT and the CMD rules just gets ignored in favour by far longer TEXT.

My first thought was to add < to the set but then I can't have text like:

-> :<

which should be perfectly valid. Any idea how to do this?

1
Have you tried to include << (and >>) into the list of "forbidden" tokens of the rule CHAR? Something like fragment CHAR: ~('\n' | '#' | '<<' | '>>');?Raven
yeah that was the first thing I tried but turns out you can't have multi-char literals in sets in ANTLR.McJones

1 Answers

0
votes

Adding a single left angle bracket to the exclusion list creates a single corner case that is easily handled:

TEXT : CHAR+ ;
CMD  : '<<' -> pushMode(Command);
LAB  : '<'  -> type(TEXT) ;

fragment CHAR : ~('\n' | '#' | '<') ;