1
votes

I want to write a rule for parsing a string inside double quotes. I want to allow any character, with the only condition being that there MUST be a line continuation character \, when splitting the string on multiple lines.

Example:

variable = "first line \n second line \
            still second line \n \
            third line"

If the line continuation character is not found before a newline character is found, I want the parser to barf.

My current rule is this:

STRING  : '"' (ESC|.)*? '"';
fragment ESC : '\\' [btnr"\\] ;

So I am allowing the string to contain any character, including bunch of escape sequences. But I am not really enforcing that line continuation character \ is a necessity for splitting text.

How can I make the grammar enforce that rule?

2

2 Answers

3
votes

Even though there is already an accepted answer let me put in my 2cents. I strongly recommend not to handle this type of error in a lexer rule. The reason is that you will not be able to give the user a good error message. First, lexer errors are usually not reported separately in ANTLR4, they appear as follow up parser errors. Second, the produced error (likely something like: "no viable alt at \n") is all but helpful.

The better solution is to accept both variants (linebreak with or w/o escape) and do a semantic check afterwards. Then you know exactly what is wrong and can the user tell what you really expected.

1
votes

Solution

fragment ESCAPE
    : '\\' .
    ;
STRING
    : '"' (ESCAPE | ~[\n"])* '"'
    ;

Explanation

Fragment ESCAPE will match escaped characters (especially backslash and a new line character acting as a continuation sign).

Token STRING will match inside double quotation marks:

  • Escaped characters (fragment ESCAPE)
  • Everything except new line and double quotation marks.