2
votes

I have an ANTLR v4 grammar in a .NET application. An object can either be an array or a String. An array is a list of zero or more objects enclosed in square brackets. A String is a sequence of characters enclosed in parentheses. A String may contain unescaped balanced parentheses, but it should not contain any unbalanced left or right parentheses; they can be included using the escape sequence \( or \). As \ would be used to introduce the escape sequence, it would then also need to be escaped as \\.

I have tried to code the grammar in such as way that balanced parentheses are simply recursive Strings within Strings, with a base case that disallows parentheses except in an escape sequence.

grammar Sample ;

root
    : 'BT' object+ 'ET' EOF
    ;

object
    : array
    | String
    ;

array
    : '[' object* ']'
    ;

String
    : '(' ( StringCharacter | String )* ')'
    ;

fragment StringCharacter
    : EscapeSequence
    | ~[()\\]
    ;

fragment EscapeSequence
    : '\\('
    | '\\)'
    | '\\'
    ;

Whitespace : [ \t\r\n] -> skip ;

The grammar above works for some values

BT [] ET
BT () ET
BT (\)) ET
BT () () ET
BT (one) (two) ET
BT [(one) (two)] ET
BT (one) [(two)] ET
BT (\() [(two)] ET
BT () [(\))] ET
BT (\)) (\)) ET

but it fails for this one

BT (\() [(\))] ET

In this case, I am trying to encode a String with a single escaped left parenthesis then an array with a single element that's a String with a single escaped right parenthesis.

The error message states:

line: 1:13 extraneous input ']' expecting {'ET', '[', String}

How should I change the grammar to achieve my goal?

1

1 Answers

1
votes

I was missing an extra pair of \\ from the escape sequence lexer rule:

fragment EscapeSequence
    : '\\('
    | '\\)'
    | '\\\\'
    ;