1
votes

I have the following lex definitions:

[a-zA-Z][a-zA-Z0-9_]*       return NAME;

\,              return COMMA;
\:              return COLON;
\;              return SEMICOLON;
\(              return OPAREN;
\)              return CPAREN;
\+              return PLUS;

And the following yacc production rules:

program: 
    | program statement;

arglist:
    OPAREN CPAREN
    | OPAREN expressionlist CPAREN;

trailed:
    NAME
    | trailed arglist;

expression:
    trailed
    | expression PLUS trailed;

expressionlist:
    expression
    | expressionlist COMMA expression;

statement:
    expression SEMICOLON
    |NAME arglist COLON expression SEMICOLON;

Everything compiles well if I comment out the last rule. With the last rule I get a conflict:

yacc: 1 shift/reduce conflict.

So I guess, yacc cannot decide whether to shift the next symbol onto the stack or to reduce the stack with a give rule.

  1. Is my grammar ambiguous?

  2. Shouldn't the decision between rule "trailed: trailed arglist" and "statement: NAME arglist COLON expression SEMICOLON" be without conflict, because the former never has a colon, while the latter always has?

  3. Has this something to do with the size of the look-ahead buffer?

  4. How can I fix this grammar to parse both "a (b) ();" and "a (b, c): b + c;" as valid statements?

  5. How can I backtrack the conflict in a more detailed manner?

---- EDIT

Concerning MichaelMoser's answer:

Changing

arglist:
    OPAREN CPAREN
    | OPAREN expressionlist CPAREN;

expressionlist:
    expression
    | expressionlist COMMA expression;

to

arglist: OPAREN expressionlist CPAREN;

expressionlist:
| expressionlist COMMA expression; //this now allows for expression lists like ,a,b but NVM

as suggested doesn't help. The conflict still arises with the second rule for statement active, and once commenting that rule out, no conflict is given.

3

3 Answers

2
votes

As others have noted, the problem is that you need more than one token of lookahead to differentiate between a function definition and a function call. The problem with the grammar as written is that it needs to decide between reducing the rule trailed: NAME and shifting to match the rule statement: NAME arglist COLON expression SEMICOLON after seeing a NAME when the lookahead is OPAREN. But it can't decide until after it sees the arglist to see if there's a COLON after it or not (which is what distinguishes the two cases).

To fix this, you need to refactor the grammar so that there's no need to reduce anything only present on one alternative until you get to the COLON. With this grammar, you can do this by refactoring the trailed rule to always require at least one arglist, and making a NAME with no arglist a separate expression rule:

trailed:
    NAME arglist
    | trailed arglist;

expression:
    NAME
    | trailed
    | expression PLUS NAME
    | expression PLUS trailed;

Now when your get an input NAME OPAREN ... there's no need to reduce anything yet -- you just shift into the rule matching an arglist and after matching the arglist, you can see the next token and decide whether this is a function call or a function definition.

1
votes

If you want to debug shift/reduce error, then add --report=all --report-file=pars.report to the bison command line, the resulting pars.report file will make things clear to you.

The reduced grammar file

%token OPAREN CPAREN NAME PLUS COMMA SEMICOLON COLON

%%
program:
    | program statement;

arglist:
    OPAREN CPAREN
    | OPAREN expressionlist CPAREN;

trailed:     
    NAME
    | trailed arglist;

expression:
    trailed
    | expression PLUS trailed;

expressionlist:
    expression      
    | expressionlist COMMA expression;

statement:     
    expression SEMICOLON
    |NAME arglist COLON expression SEMICOLON;

gives the following report:

State 3 conflicts: 1 shift/reduce

....

state 3

    3 arglist: . OPAREN CPAREN
    4        | . OPAREN expressionlist CPAREN
    5 trailed: NAME .  [OPAREN, PLUS, SEMICOLON]
   12 statement: NAME . arglist COLON expression SEMICOLON

    OPAREN  shift, and go to state 7

    OPAREN    [reduce using rule 5 (trailed)]
    $default  reduce using rule 5 (trailed)

    arglist  go to state 8

at the start of the report there is the parser state that is causing the error, then later down you can see the details of the parsing state. This file is also interesting because it actually explains what the parser is doing at each step.

The conflict is between

statement : NAME arglist COLON expression SEMICOLON;

and

statement : expression;
expression : trailed;
trailed : NAME | tailed arglist;
0
votes

Given the source grammar:

%token COLON COMMA CPAREN NAME OPAREN PLUS SEMICOLON

%%

program: 
    | program statement
    ;

arglist:
      OPAREN CPAREN
    | OPAREN expressionlist CPAREN
    ;

trailed:
      NAME
    | trailed arglist
    ;

expression:
      trailed
    | expression PLUS trailed
    ;

expressionlist:
      expression
    | expressionlist COMMA expression
    ;

statement:
      expression SEMICOLON
    | NAME arglist COLON expression SEMICOLON
    ;

The report I get from bison -v xyz.y gives xyz.y: conflicts: 1 shift/reduce, but the description in xyz.output is somewhat different from what MichaelMoser reports in his answer.

State 3 conflicts: 1 shift/reduce

…

state 3

    5 trailed: NAME .
   12 statement: NAME . arglist COLON expression SEMICOLON

    OPAREN  shift, and go to state 7

    OPAREN    [reduce using rule 5 (trailed)]
    $default  reduce using rule 5 (trailed)

    arglist  go to state 8

This means that when the grammar has read a NAME and gets an OPAREN, it cannot deduce whether it is in a trailed that will be followed by an arglist or is in a statement with a NAME followed by an arglist. It won't be able to determine the difference until it reaches the COLON, which is much too far away to be determined by one token of lookahead.

This makes the grammar unresolvable with plain Yacc because Yacc can only look ahead one token. (I'm not sure whether that makes it ambiguous or something else — "does not work in Yacc" covers the situation. Bison provides GLR grammars that might help.)