Bison shift reduce conflict on comma

Question

I have strange shift-reduce warnings no matter what I change. Reduced grammar:

expr : NUMBER 
     | NUMBER ',' expr 
     | expr ',' NUMBER ',' NUMBER

Bison reports shift reduce on 2nd rule with comma. I tried to set precedence:

%nonassoc num_p
%nonassoc exp_p

expr : NUMBER %prec num_p
     | NUMBER ',' expr %prec exp_p 
     | expr ',' NUMBER ',' NUMBER

but warning stays the same. Can someone explain me what am I missing here?

That is not enough information. You need to at least show the productions which participate in the conflict. — rici
@rici there are 500+ rules. I wrote only the ones that has to do with above non-terminals. So: expr : NUMBER | NUMBER ',' expr | expr ',' NUMBER ',' NUMBER — Miloš Ljumović
That is very different from your question or your earlier comment. If it is really part of your grammar, then it is the cause of the conflict. I suggest that you edit your question, so that it can be answered. — rici

rici rici · Accepted Answer · 2016-05-10T16:04:28

It is clear that the following is ambiguous:

expr : NUMBER %prec num_p
     | NUMBER ',' expr %prec exp_p 
     | expr ',' NUMBER ',' NUMBER

since any list of three or more numbers can be parsed in various ways. Roughly speaking, we can take single numbers off the beginning of the list or pairs of numbers off the end of the list, until we meet somewhere in the middle; however, there is no definition of where the middle point might be.

Consider, for example, the various parse trees which could produce 1, 2, 3, 4, 5. Here are just two (with numbers indicating which production was used to expand expr):

    expr(2)                       expr(3)
    /    \                       /   |  \
  /       \                     /    |   |  
 |    expr(2)                  /     |   |  
 |     /   \                  /      |   |
 |    /     \             expr(3)    |   |
 |   /    expr(3)          / | \     |   |
 |   |    /  | \          /  |  \    |   |
 |   |expr(1)|  \     expr(1)|   |   |   |
 |   |   |   |   |       |   |   |   |   |
 1 , 2 , 3 , 4 , 5       1 , 2 , 3 , 4 , 5

Both of the above trees are, in some sense, maximal. The one on the left takes as many single NUMBERs as possible using production 2, until only two NUMBERs are left for production 3. The one on the right applies production 3 as many times as possible. (It would have needed a single application of production 2 if the list of numbers had an even length.)

In order to resolve the ambiguity, we need a clear statement of the intent. But it seems to me unlikely that it can be resolved with a precedence declaration. Remember that precedence relations are always between a possible reduction (on the top of the parser stack) and a lookahead symbol. They never compare two lookahead symbols nor two productions. If the lookahead symbol wins, it is shifted onto the stack; if the reduction wins, the stack is reduced. It is no more complicated than that.

So if precedence could help, the relevant token must be ',', not NUMBER. NUMBER must always be shifted onto the parse stack. Since no production ends with ',', it is never possible to reduce the stack when NUMBER is the lookahead symbol. By contrast, when ',' is the lookahead symbol, it is usually possible either to reduce the top of the parser stack or to shift the ',' in preparation for a different reduction.

The only place where this decision is even possible is in the case where we have seen NUMBER and are looking at a ',', so we have to decide whether to apply production 1, in preparation for production 3, or shift the ',', leaving production 2 as the only option. Neither of these decisions can prosper: If we choose to reduce, it is possible that production 3 will turn out to be impossible because there are not enough numbers in the list; if we choose to shift, then production 3 will never be used.

In a left-to-right parse, the algorithm which produces the right-hand parse above is not possible, because we cannot know whether the list is of even or odd length until we reach the end, at which point it is far too late to retroactively reduce the first two numbers. On the other hand, the left-hand parse will require a lookahead of four tokens, not one, in order to decide at which point to stop using production 2. That makes it possible to construct an LR(1) grammar, because there is a way to rewrite any LR(k) grammar as an LR(1) grammar, but the resulting grammar is usually complicated.

I suspect that neither of these was really your intention. In order to recommend a resolution, it will be necessary to know what the precise intention is.

One possibility (motivated by a comment) is that expr also includes something which is neither a number nor a list of numbers:

expr: NUMBER
    | complex_expression

In that case, it might be that the grammar intends to capture two possibilities:

a list containing NUMBERs with a possibly complex_expression at the end;
a list containing an even number of NUMBERs with a possibly complex_expression at the beginning.

What is left ambiguous in the above formulation is a list consisting only of NUMBERs, since either the first or last number could be parsed as an expr. Here there are only a couple of reasonable possible resolutions:

a list of NUMBERs is always be parsed as the first option (expr at the end)
a list of NUMBERs is parsed as the second option (expr at the beginning) if and only if there are an odd number of elements in the list.

The first resolution is much easier, so we can start with it. In this case, the first element in the list essentially determines how the list will be parsed, so it not possible to reduce the first NUMBER to expr. We can express that by separating the different types of expr:

expr: list_starting_expr | list_ending_expr
list_starting_expr: complex_expression ',' NUMBER ',' NUMBER
                  | list_start_expr ',' NUMBER ',' NUMBER
list_ending_expr  : complex_expression
                  | NUMBER ',' list_ending_expr 
                  | NUMBER

The last production in the above example allows for a list entirely of NUMBERs to be parsed as a list_ending_expr.

It also allows a list containing only a single complex_expression to be parsed as a list_ending_expr, while a list_starting_expr is required to have at least three elements. Without that, a list consisting only of a complex_expression would have been ambiguous. In the example grammar in the question, the list containing only a complex_expression is implicitly forbidden; that could be reproduced by changing the base production for list_ending_expr from list_ending_expr: complex_expression to list_ending_expr: NUMBER ',' complex_expression.

But what if we wanted the second resolution? We can still recognize the language, but constructing a correct parse tree may require some surgery. We can start by separating out the case where the list consists only of NUMBERs.

expr: list_starting_expr | list_ending_expr | ambiguous_list
list_starting_expr: complex_expression ',' NUMBER ',' NUMBER
                  | list_starting_expr ',' NUMBER ',' NUMBER
list_ending_expr  : NUMBER ',' complex_expression
                  | NUMBER ',' list_ending_expr 
ambiguous_list    : NUMBER
                  | NUMBER ',' ambiguous_list

Despite the frequently-repeated claim that right-recursion should be avoided in bottom-up grammars, here it is essential that ambiguous_list and list_ending_expr be right-recursive, because we cannot distinguish between the two possibilities until we reach the end of the list.

Now we could use a semantic action to simply count the number of elements in the list. That action needs to be associated with the reduction of ambiguous_list to expr:

expr: list_starting_expr | list_ending_expr
    | ambiguous_list {
        if (count_elements($1) % 2 == 1) {
          $$ = make_list_starting_expr($1);
        }
        else {
          $$ = make_list_starting_expr($1);
        }
      }

But we can actually distinguish the two cases grammatically, precisely because of the right recursion:

expr: list_starting_expr
    | list_ending_expr
    | even_list_of_numbers
    | odd_list_of_numbers
list_starting_expr  : complex_expression ',' NUMBER ',' NUMBER
                    | list_starting_expr ',' NUMBER ',' NUMBER
list_ending_expr    : NUMBER ',' complex_expression
                    | NUMBER ',' list_ending_expr 
odd_list_of_numbers : NUMBER
                    | NUMBER ',' NUMBER ',' odd_list_of_numbers
even_list_of_numbers: NUMBER ',' NUMBER 
                    | NUMBER ',' NUMBER ',' even_list_of_numbers

It might be more meaningful to write this as:

expr: expr_type_one | expr_type_two
expr_type_one: list_starting_expr | even_list_of_numbers
expr_type_two: list_ending_expr | odd_list_of_numbers
 /* Remainder as above */

Note: The above grammars, like the grammar in the original question, do not allow expr to consist only of a complex_expression. If that it were desired to handle the case of only a single complex_expression, then that syntax could be added directly to the productions for expr (or to whichever of expr_type_one or expr_type_two were appropriate.

Bison shift reduce conflict on comma

2 Answers