I'm trying to represent the BYOND DM language strings in lexer form (See http://byond.com and http://byond.com/docs/ref). Here are the rules for strings:
- The string start and end with double quotes. i.e.
"hello world"evaluates tohello world - A backslash acts as an escape character, which can escape the end quote. i.e.
"hello\"world"evaluates tohello"world - Newlines in the string can be ignored by ending the line with a backslash. i.e.
"hello\ world"evaluates tohelloworld - If the string opens/closes with the sequence
{"/"}respectively, newlines are allowed and entered into the final string. The sequence\\\nis still ignored - The string can contain embedded expressions inside braces which are formatted into the result. Backslashes can escape the opening brace. i.e.
"hello [ "world" ] \["evaluates tohello world [at run-time. Any expression can go in the braces (calls, math, etc...) - If the starting quote/curly brace is prefixed with '@' escape sequences and embedded expressions are disabled for the string. i.e.
@{"hello [worl\d"}and@"hello [worl\d"both evaluate tohello [worl\d
I am trying to construct ANTLR4 .g4 lexer rules to tokenize these strings. I figure there's 4 (or more) token types I'd need:
- Normal string. i.e
"hello world",@"hello world",@{"hello world"}or{"hello world"} - String start before embedded expression. i.e.
"hello [or{"hello [ - String end after embedded expression. i.e.
] world"or] world"} - String in between two embedded expressions. i.e.
] hello world [
Here are my (incomplete and unsuccessful) attempts:
LSTRING: '"' ('\\[' | ~[[\r\n])* '[';
RSTRING: ']' ('\\"' | ~["\r\n])* '"';
CSTRING: ']' ('\\[' | ~[[\r\n])* '[';
FSTRING: '"' ('\\"' | ~["\r\n])* '"';
If this can't be solved in the lexer, I can write the parser rules on my own with the tokens @, {", "}, [, ], \\, and ". But, I figure I'd give this a shot since it'd be more performant.