Shift-reduce conflicts in a simple(?) grammar

Question

I am trying to describe a grammar in bison but I am unsure if it can be done. My intended grammar is this:

%token A B C D SEP

%%

items          : /* empty */
               | items_nonempty
               ;

items_nonempty : item
               | items_nonempty SEP item
               ;

item           :       B
               |       B       SEP D
               |       B SEP C
               |       B SEP C SEP D
               | A SEP B
               | A SEP B       SEP D
               | A SEP B SEP C
               | A SEP B SEP C SEP D
               ;

"items" is a (possible empty) sequence of item elements, separated by a SEP token.

Each item consists of up to 4 tokens (A B C D), in that order, separated by SEP. The A, C, and D tokens in an item are optional.

Note the re-use of the same separator token SEP within each item, and between the items themselves.

I hope the intended grammer is clear. I think it is unambiguous, but I am quite unsure if it is sufficiently restricted to be parseable by bison – unfortunately, my parser knowledge is quite rusty.

Using the grammar as given, bison reports 4 shift/reduce conflicts. Looking at the 'output' I understand where they occur and why; but I am at a loss how (and if) the intended grammar can be written to get rid of the S/R conflicts.

I am unwilling to use an %expect declaration. Likewise, I am unwilling to have my scanner consume the separator tokens rather than have them pass on to the parser.

Any hints on how to sanitise this grammar would be greatly appreciated.

Chris Dodd Chris Dodd · Accepted Answer · 2012-04-06T00:24:11

The basic problem is that the grammar as written needs TWO tokens of lookahead to decide when it has found the end of an item and can thus reduce it or if there's another piece of the current item after the SEP it sees as the next character of the lookahead.

There are a number of approaches you can try

use btyacc or bison's GLR support to effectively get more lookahead.
write the grammar to accept an arbitrary list of single items and then use a post-pass to regroup them into sets of 1-4 items with at least 1 B and reject malformed sets (This is Gunther's suggestion)
use the scanner to do more lookahead -- instead of returning simple SEP tokens, return SEP_BEFORE_A_OR_B or SEP_NOT_BEFORE_A_OR_B depending on what the next token after the SEP is.
combine tokens in the scanner -- return SEP_C and SEP_D as single tokens (a separator followed by a C or D)

Shift-reduce conflicts in a simple(?) grammar

4 Answers