1
votes

I'm trying to create a new rule in the R grammar for Raw Strings.

Quote of the R news:

There is a new syntax for specifying raw character constants similar to the one used in C++: r"(...)" with ... any character sequence not containing the sequence )". This makes it easier to write strings that contain backslashes or both single and double quotes. For more details see ?Quotes.

Examples:

## A Windows path written as a raw string constant:
r"(c:\Program files\R)"

## More raw strings:
r"{(\1\2)}"
r"(use both "double" and 'single' quotes)"
r"---(\1--)-)---"

But I'm unsure if a grammar file alone is enough to implement the rule. Until now I tried something like this as a basis from older suggestions of similar grammars:

Parser:

|   RAW_STRING_LITERAL #e42

Lexer:

RAW_STRING_LITERAL
        : ('R' | 'r') '"' ( '\\' [btnfr"'\\] | ~[\r\n"]|LETTER )* '"' ; 

Any hints or suggestions are appreciated.

R ANTLR Grammar:

https://github.com/antlr/grammars-v4/blob/master/r/R.g4

Original R Grammar in Bison:

https://svn.r-project.org/R/trunk/src/main/gram.y

1

1 Answers

0
votes

To match start- and end-delimiters, you will have to use target specific code. In Java that could look like this:

@lexer::members {
  boolean closeDelimiterAhead() {
    // Get the part between `r"` and `(`
    String delimiter = getText().substring(2, getText().indexOf('('));

    // Construct the end of the raw string
    String stopFor = ")" + delimiter + "\"";

    for (int n = 1; n <= stopFor.length(); n++) {
      if (this._input.LA(n) != stopFor.charAt(n - 1)) {
        // No end ahead yet
        return false;
      }
    }

    return true;
  }
}

RAW_STRING
 : [rR] '"' ~[(]* '(' ( {!closeDelimiterAhead()}? . )* ')' ~["]* '"'
 ;

which tokenizes r"---( )--" )----" )---" as a single RAW_STRING.

EDIT

And since the delimiters can only consist of hyphens (and parenthesis/braces) and not just any arbitrary character, this should do it as well:

RAW_STRING
 : [rR] '"' INNER_RAW_STRING '"'
 ;

fragment INNER_RAW_STRING
 : '-' INNER_RAW_STRING '-'
 | '(' .*? ')'
 | '{' .*? '}'
 | '[' .*? ']'
 ;