3
votes

I've been trying to tackle a seemingly simple shift/reduce conflict with no avail. Naturally, the parser works fine if I just ignore the conflict, but I'd feel much safer if I reorganized my rules. Here, I've simplified a relatively complex grammar to the single conflict:

statement_list
  : statement_list statement 
  | 
  ;

statement
  : lvalue '=' expression
  | function
  ;

lvalue
  : IDENTIFIER
  | '(' expression ')'
  ;

expression
  : lvalue
  | function
  ;

function
  : IDENTIFIER '(' ')'
  ;

With the verbose option in yacc, I get this output file describing the state with the mentioned conflict:

state 2

    lvalue  ->  IDENTIFIER .   (rule 5)
    function  ->  IDENTIFIER . '(' ')'   (rule 9)

    '('  shift, and go to state 7

    '('  [reduce using rule 5 (lvalue)]
    $default reduce using rule 5 (lvalue)

Thank you for any assistance.

1

1 Answers

5
votes

The problem is that this requires 2-token lookahead to know when it has reached the end of a statement. If you have input of the form:

ID = ID ( ID ) = ID

after parser shifts the second ID (lookahead is (), it doesn't know whether that's the end of the first statement (the ( is the beginning of a second statement), or this is a function. So it shifts (continuing to parse a function), which is the wrong thing to do with the example input above.

If you extend function to allow an argument inside the parenthesis and expression to allow actual expressions, things become worse, as the lookahead required is unbounded -- the parser needs to get all the way to the second = to determine that this is not a function call.

The basic problem here is that there's no helper punctuation to aid the parser in finding the end of a statement. Since text that is the beginning of a valid statement can also appear in the middle of a valid statement, finding statement boundaries is hard.