3
votes

I've written following bison grammar file:

%left '+' '-'
%left '*' '/'

%token NUMBER

%%

expr
: NUMBER
| expr '+' expr
| expr '-' expr
| expr '*' expr
| expr '/' expr
| expr     expr %prec '*' /* implicit multiplication */
;

Now bison reports shift/reduce conflict regarding expr : expr expr. I've extracted problem to following minimal set:

%left OP

%%

expr
: 'a'
| expr expr %prec OP
;

I can't get why is bison still complains about shift/reduce conflict. I've found some old mail archive: Re: bison/yacc: shift/reduce conflict using %prec for composition, but I don't understand author's explanations either.

Could someone clarify why this grammar is ambiguous, and how to resolve conflict?

EDIT: by NUMBER NUMBER I mean NUMBER * NUMBER, i.e. product of this two numbers.

2
What do you expect adjacent numbers to do?Jonathan Leffler
@JonathanLeffler, I'll multiply them. Actually, I want to realize implicit multiplication, so that star symbol may be omitted in e.g. "4*(1+2)" -> "4(1+2)UncleAli

2 Answers

3
votes

The problem here is that the token NUMBER doesn't have a precedence. So when there's a state that can either shift a NUMBER or reduce a rule (regardless of whether that rule has a precedence) it can't decide which to do.

Now you can fix it for this grammar by adding a precedence for NUMBER (make it the same as *), but it will come back if you add any rules for expression that start with any token other than NUMBER -- for example if you add expr: '(' expr ')', you'll get shift/reduce conflicts on '('.

An even bigger problem is if you add unary prefix operators, such as expr: '-' expr. In this case, you won't get a conflict as '-' already has a precedence, but an input like NUMBER - NUMBER will get parsed as NUMBER ( - NUMBER ), which is probably not at all what you want. There's no good way of dealing with this issue with precedence rules.

The underlying cause of this confusion is that bison's precedence rules don't resolve precedence by comparing two rules as you might naively expect based on using the word "precedence". Instead they work by comparing the "precedence" of a rule to be reduced with that of a token to be shifted and base the decision to shift or reduce based on that. At the point in the parse where this happens, the second rule hasn't yet been recognized; instead bison is just guessing what it might be based on the token.

2
votes

Part of the answer is to look hard at the output file from bison -v. For your first grammar, I got these extracts:

State 8 conflicts: 1 shift/reduce
State 9 conflicts: 1 shift/reduce
State 10 conflicts: 1 shift/reduce
State 11 conflicts: 1 shift/reduce
State 12 conflicts: 1 shift/reduce

Grammar

    0 $accept: expr $end

    1 expr: NUMBER
    2     | expr '+' expr
    3     | expr '-' expr
    4     | expr '*' expr
    5     | expr '/' expr
    6     | expr expr

So there are 5 shift/reduce conflicts in the grammar. These are the less serious type of conflict; you can state that you expect them with %expect 5 in the grammar, if you're convinced that what the grammar is doing is correct.

state 0

    0 $accept: . expr $end

    NUMBER  shift, and go to state 1

    expr  go to state 2

state 1

    1 expr: NUMBER .

    $default  reduce using rule 1 (expr)

state 2

    0 $accept: expr . $end
    2 expr: expr . '+' expr
    3     | expr . '-' expr
    4     | expr . '*' expr
    5     | expr . '/' expr
    6     | expr . expr

    $end    shift, and go to state 3
    '+'     shift, and go to state 4
    '-'     shift, and go to state 5
    '*'     shift, and go to state 6
    '/'     shift, and go to state 7
    NUMBER  shift, and go to state 1

    expr  go to state 8

state 3

    0 $accept: expr $end .

    $default  accept

state 4

    2 expr: expr '+' . expr

    NUMBER  shift, and go to state 1

    expr  go to state 9

States 5, 6, 7 mimic state 4 but for the other operators. State 8 is the first of the states with a shift/reduce conflict. Remember the . (dot) in the rules indicates where the parser is when it reaches this state.

state 8

    2 expr: expr . '+' expr
    3     | expr . '-' expr
    4     | expr . '*' expr
    5     | expr . '/' expr
    6     | expr . expr
    6     | expr expr .

    NUMBER  shift, and go to state 1

    NUMBER    [reduce using rule 6 (expr)]
    $default  reduce using rule 6 (expr)

    expr  go to state 8

state 9

    2 expr: expr . '+' expr
    2     | expr '+' expr .
    3     | expr . '-' expr
    4     | expr . '*' expr
    5     | expr . '/' expr
    6     | expr . expr

    '*'     shift, and go to state 6
    '/'     shift, and go to state 7
    NUMBER  shift, and go to state 1

    NUMBER    [reduce using rule 2 (expr)]
    $default  reduce using rule 2 (expr)

    expr  go to state 8

There are differences and similarities between these two states, but states 10, 11, 12 match state 9 except for the different point of ambiguity.

The trouble is that when the grammar sees:

NUMBER OP NUMBER NUMBER

it cannot tell whether to parse that as:

( NUMBER OP NUMBER ) NUMBER expr expr

or as:

NUMBER OP ( NUMBER NUMBER )
 expr  OP       expr

Given that it is a shift/reduce conflict in each case, it chooses to shift. If that's what you want, then add the %expect 5 and get on with life. If that's not what you want, then you need to rethink your grammar. What does a pair of adjacent numbers indicate, and are you sure you don't need some operator character (perhaps a comma or colon) to separate them?


I tried raising the precedence of the missing operator by using:

%left MISSING

after the other precedence declarations, and then using:

expr expr %prec MISSING

This did not change anything. Neither does making the precedence of MISSING very low by listing it before the other operators.

You get an inkling of the problems if you consider how an expression like this should be parsed:

NUMBER OP NUMBER NUMBER NUMBER OP NUMBER NUMBER OP NUMBER

Where the OP is the same in each appearance. My brain is hurting! So is bison's!