0
votes

I want to create a grammar rule for a simplified version of a block statement in C, which matches a list of statements in braces with optional newlines at the beginning and end. Statements in this language are terminated by a newline character.

The optional newlines are so that block statements can span multiple as well as single lines. i.e, both

{ statement }

and

{
    statement
}

should be supported.

Currently my rules are as follows:

BlockStmt:
    '{' OptionalNewlines BlockStmtList OptionalNewlines '}';


OptionalNewlines:
        OptionalNewlines '\n'
    |   %empty;

Empty blocks are also supported, which are basically blocks with just newlines in them and no statements. This is possible because BlockStmtList can reduce to %empty.

However for empty blocks with just newlines, this leads to a shift-reduce conflict as the newlines can be matched by both the beginning and the ending OptionalNewlines non-terminal.

How do I tell yacc to prioritise one of the OptionalNewlines in the case of an empty block with just newlines?

1
Are you using newlines as statement terminators instead of semicolons? Or do you still have semicolons? If you have semicolons, why bother even passing the newlines to the parser?rici
forgot to clarify, I am using newlines as statement terminatorsashenoy
Please edit your question, thanks.rici

1 Answers

1
votes

Unless you have a problem with blank lines in the middle of a block -- something which many of us like to do -- the simple solution is to just allow empty statements (that is, a statement consisting only of the newline terminator. If you do that, you can stop worrying about optional newlines and just use

BlockStmt: '{' BlockStmtList '}';

That's far and away the easiest. But if it doesn't work for you, read on.

In general, you cannot have a sequence of optional lists where two of the lists have the same elements. That leads to an ambiguity: if your grammar allows a* b* a* (using Kleene * for simplicity) and the input is a, there is no way to know whether the empty a* is before or after the empty b*. "Optional" elements are problematic in many situations; it's often necessary to expand empty non-terminals into multiple rules using non-optional elements:

BlockStmt: '{' '}'
         | '{' NewLineList '}'
         | '{' NewLineList StmtList OptionalNewLineList '}'
         | '{' StmtList OptionalNewLineList '}'