flex&bison shift/reduce conflict

Question

Here are part of my grammar:

expr_address
: expr_address_category expr_opt    { $$ = new ExprAddress($1,*$2);}
| axis axis_data                    { $$ = new ExprAddress($1,*$2);}
;
axis_data
: expr_opt          { $$ = $1;}
| sign              { if($1 == MINUS)
                        $$ = new IntergerExpr(-1000000000);
                      else if($1 == PLUS)
                        $$ = new IntergerExpr(+1000000000);}
;
expr_opt
:               { $$ = new IntergerExpr(0);}
| expr          { $$ = $1;}
;
expr_address_category
: I                 { $$ = NCAddress_I;}
| J                 { $$ = NCAddress_J;}
| K                 { $$ = NCAddress_K;}
;
axis
: X                 { $$ = NCAddress_X;}
| Y                 { $$ = NCAddress_Y;}
| Z                 { $$ = NCAddress_Z;}
| U                 { $$ = NCAddress_U;}
| V                 { $$ = NCAddress_V;}
| W                 { $$ = NCAddress_W;}
;
expr
: '[' expr ']'              {$$ = $2;}
| COS parenthesized_expr    {$$ = new BuiltinMethodCallExpr(COS,*$2);}
| SIN parenthesized_expr    {$$ = new BuiltinMethodCallExpr(SIN,*$2);}
| ATAN parenthesized_expr   {$$ = new BuiltinMethodCallExpr(ATAN,*$2);}
| SQRT parenthesized_expr   {$$ = new BuiltinMethodCallExpr(SQRT,*$2);}
| ROUND parenthesized_expr  {$$ = new BuiltinMethodCallExpr(ROUND,*$2);}
| variable                  {$$ = $1;}
| literal                           
| expr '+' expr                 {$$ = new BinaryOperatorExpr(*$1,PLUS,*$3);}
| expr '-' expr                 {$$ = new BinaryOperatorExpr(*$1,MINUS,*$3);}
| expr '*' expr                 {$$ = new BinaryOperatorExpr(*$1,MUL,*$3);}
| expr '/' expr                 {$$ = new BinaryOperatorExpr(*$1,DIV,*$3);}
| sign expr %prec UMINUS        {$$ = new UnaryOperatorExpr($1,*$2);}
| expr EQ expr                  {$$ = new BinaryOperatorExpr(*$1,EQ,*$3);}
| expr NE expr                  {$$ = new BinaryOperatorExpr(*$1,NE,*$3);}
| expr GT expr                  {$$ = new BinaryOperatorExpr(*$1,GT,*$3);}
| expr GE expr                  {$$ = new BinaryOperatorExpr(*$1,GE,*$3);}
| expr LT expr                  {$$ = new BinaryOperatorExpr(*$1,LT,*$3);}
| expr LE expr                  {$$ = new BinaryOperatorExpr(*$1,LE,*$3);}
;
variable 
: d_h_address               {$$ = new AddressExpr(*$1);}
;
d_h_address
: D INTEGER_LITERAL     { $$ = new IntAddress(NCAddress_D,$2);}
| H INTEGER_LITERAL     { $$ = new IntAddress(NCAddress_H,$2);}
;

I hope my grammar support that like:

H100=20;
X;
X+0;
X+;
X+H100;   //means H100 variable ref

The top two are same with X0; By the way,sign -> +/-;

But bison report conflicts,the key part of bison.output:

State 108

11 expr: sign . expr
64 axis_data: sign .

INTEGER_LITERAL  shift, and go to state 93
REAL_LITERAL     shift, and go to state 94
'+'              shift, and go to state 74
'-'              shift, and go to state 75
COS              shift, and go to state 95
SIN              shift, and go to state 96
ATAN             shift, and go to state 97
SQRT             shift, and go to state 98
ROUND            shift, and go to state 99
D                shift, and go to state 35
H                shift, and go to state 36
'['              shift, and go to state 100

D         [reduce using rule 64 (axis_data)]
H         [reduce using rule 64 (axis_data)]
$default  reduce using rule 64 (axis_data)

State 69

62 expr_address: axis . axis_data

INTEGER_LITERAL  shift, and go to state 93
REAL_LITERAL     shift, and go to state 94
'+'              shift, and go to state 74
'-'              shift, and go to state 75
COS              shift, and go to state 95
SIN              shift, and go to state 96
ATAN             shift, and go to state 97
SQRT             shift, and go to state 98
ROUND            shift, and go to state 99
D                shift, and go to state 35
H                shift, and go to state 36
'['              shift, and go to state 100

D         [reduce using rule 65 (expr_opt)]
H         [reduce using rule 65 (expr_opt)]
$default  reduce using rule 65 (expr_opt)

State 68

61 expr_address: expr_address_category . expr_opt

INTEGER_LITERAL  shift, and go to state 93
REAL_LITERAL     shift, and go to state 94
'+'              shift, and go to state 74
'-'              shift, and go to state 75
COS              shift, and go to state 95
SIN              shift, and go to state 96
ATAN             shift, and go to state 97
SQRT             shift, and go to state 98
ROUND            shift, and go to state 99
D                shift, and go to state 35
H                shift, and go to state 36
'['              shift, and go to state 100

D         [reduce using rule 65 (expr_opt)]
H         [reduce using rule 65 (expr_opt)]
$default  reduce using rule 65 (expr_opt)

I don't know how to deal with this,thanks advance.

EDIT： I make a minimal grammar：

    %{
    #include <stdio.h>
    extern "C" int yylex();
    void yyerror(const char *s) { printf("ERROR: %s/n", s); }
%}

%token PLUS '+'  MINUS '-' 

%token D H I J K X Y Z INT

/*%type sign expr var expr_address_category expr_opt
%type axis */

%start word_list

%%
/*Above grammar lost this rule,it makes ambiguous*/
word_list
    : word
    | word_list word
    ;
sign
    : PLUS
    | MINUS
    ;
expr
    : var
    | sign expr
    | '[' expr ']'
    ;
var 
    : D INT
    | H INT
    ;
word
    : expr_address
    | var '=' expr
    ;
expr_address
    : expr_address_category expr_opt
    /*| '(' axis sign ')'*/
    | axis sign
    ;
expr_opt
    : /* empty */
    | expr
    ;
expr_address_category
    : I 
    | J
    | K
    | axis
    ;
axis
    : X
    | Y
    | Z
    ;
%%

and I hope it can support：

X;
X0;
X+0;  //the top three are same with X0
X+;
X+H100;  //this means X's data is ref +H100;
X+H100=10; //two word on a block,X+ and H100=10;
XH100=10;  //two word on a block,X and H100=10;

EDIT2:

The above EDIT lost this rule.

block
    : word_list ';' 
    | ';'
    ;

Because I have to allow such grammar:

H000 = 100 H001 = 200 H002 = 300;

I think we need more of the grammar, to see the context in which expr_address can appear. Also, I didn't understand how to apply "the top two are the same with X0" to H100=20; — rici
I checked the fragment you provided; as I expected, it has no conflicts. In order to check it, I had to add declarations of various token types, provide a definition of sign and parenthesized_expr (I used '(' expr ')'), and eliminate the literal option from expr as I have no idea what that would be. There are various assumptions there so obviously if they are wrong, there might be conflicts as a result. Note that How to Ask indicates the need for a minimal reproducible example; it's worth taking the time to read those requests because they will help you ask questions which can be answered. — rici
@rici I think the conflict like this: 'X+'H100 or 'X'+H100,whict means X+ reduce as 'axis axis_data',the other 'X' also can reduce as 'axis axis_data',it creates such as shift more one H or reduces as 'axis axis_data' conflict. — Shun
@rici I am sorry,X0 and X+0 are same....I edit it and forget to change it. — Shun
Your grammar -- the part you show -- does not allow an expr_address to be followed by an expr so interpreting X+ to axis axis_data is not possible with a lookahead of H. As I said, the grammar fragment works fine on its own so the problem is an interaction with some rule you haven't shown us. — rici

rici rici · Accepted Answer · 2017-03-14T15:44:27

This is essentially the classic LR(2) grammar, except that in your case it is LR(3) because your variables consist of two tokens [Note 1]:

var : D INT | H INT

The basic problem is the concatenation of words without separators:

word_list : word | word_list word

combined with the fact that one of the options for word ends with an optional var:

word: expr_address
expr_address: expr_address_category expr_opt

while the other one starts with a var:

word: var '=' expr

The = makes this unambiguous, since nothing in an expr can contain that symbol. But at the point where a decision needs to be made, the = is not visible, because the lookahead is the first token of a var -- either an H or a D -- and the equals sign is still two tokens away.

This LR(2) grammar is very similar to the grammar used by yacc/bison itself, a fact which I always find to be ironic, because the grammar for yacc does not require ; between productions:

production: SYMBOL ':' | production SYMBOL  /* Lots of detail omitted */

As with your grammar, this makes it impossible to know whether a SYMBOL should be shifted or trigger a reduce because the disambiguating : is still not visible.

Since the grammar is (I assume) unambiguous, and bison can now generate GLR parsers, that will be the simplest solution: just add

%glr-parser

to your bison prologue (but read the section of the bison manual on GLR parsers to understand the trade-off).

Note that the shift-reduce conflicts will still be reported as warnings; since it is impossible to reliably decide whether a grammar is ambiguous, bison doesn't attempt to do so and ambiguities will be reported at run-time if they exist.

You should also fix the issue mentioned in @ChrisDodd's answer regarding the refactoring of expr_address (although with a GLR parser it is not strictly necessary).

If, for whatever reason, you feel that a GLR parser will not meet your needs, you could use the solution in most implementations of yacc (including bison), which is a hack in the lexical scanner. The basic idea is to mark whether a symbol is followed by a colon or not in the lexer, so that the above production could be rewritten as:

production: SYMBOL_COLON | production SYMBOL

This solution would work for you if you were willing to combine the letter and the number into a single token:

word: expr_address expr_opt
    | VARIABLE_EQUALS expr
// ...
expr: VARIABLE

My preference is to do this transformation in a wrapper around the lexer, which keeps a (one-element) queue of pending tokens:

/* The use of static variables makes this yylex wrapper unreliable
 * if it is reused after a syntax error.
 */
int yylex_wrapper() {
  static int saved_token = -1;
  static YYSTYPE saved_yylval = {0};

  int token = saved_token;
  saved_token = -1;
  yylval = saved_yylval;
  // Read a new token only if we don't have one in the queue.
  if (token < 0) token = yylex();
  // If the current token is IDENTIFIER, check the next token
  if (token == IDENTIFIER) {
    // Read the next token into the queue (saved_token / saved_yylval)
    YYSTYPE temp_val = yylval;
    saved_token = yylex();
    saved_yylval = yylval;
    yylval = temp_val;
    // If the second token is '=', then modify the current token
    // and delete the '=' from the queue
    if (saved_token == '=') {
        saved_token = -1;
        token = IDENTIFIER_EQUALS;
    }
  }
  return token;
}

Notes

Personally, I would start by making a var a single token (do you really want to allow people to write:
```
H /* Some comment in the middle of the variable name */ 100
```
but that's not going to solve any problems; it merely reduces the grammar's lookahead requirement from LR(3) to LR(2).

flex&bison shift/reduce conflict

2 Answers

Notes