3
votes

A stripped down version of the grammar with the conflict:

body: variable_list function_list;
variable_list:
  variable_list variable | /* empty */
;
variable:
  TYPE identifiers ';'
;
identifiers:
  identifiers ',' IDENTIFIER | IDENTIFIER
;
function_list:
  function_list function | /* empty */
;
function:
  TYPE IDENTIFIER '(' argument_list ')' function_body
;

The problem is that variables and functions both start with TYPE and IDENTIFIER, e.g

int some_var;
int foo() { return 0; }

variables are always declared before functions in this language, but when a parse is attempted, it always gives

parse error: syntax error, unexpected '(', expecting ',' or ';' [after foo]

How can the variable_list be made to be less greedy, or have the parser realize that if the next token is a '(' instead of a ';' or ',' it is obviously a function and not a variable declaration?

The bison debug output for the conflict is

state 17

3 body: variable_list . function_list
27 variable_list: variable_list . variable

T_INT    shift, and go to state 27
T_BOOL   shift, and go to state 28
T_STR    shift, and go to state 29
T_VOID   shift, and go to state 30
T_TUPLE  shift, and go to state 31

T_INT     [reduce using rule 39 (function_list)]
T_BOOL    [reduce using rule 39 (function_list)]
T_STR     [reduce using rule 39 (function_list)]
T_VOID    [reduce using rule 39 (function_list)]
T_TUPLE   [reduce using rule 39 (function_list)]
$default  reduce using rule 39 (function_list)

variable       go to state 32
simpletype     go to state 33
type           go to state 34
function_list  go to state 35

I have tried all sorts of %prec statements to make it prefer reduce (although I am not sure what the difference would be in this case), with no success at making bison use reduce to resolve this, and I have also tried shuffling the rules around making new rules like non_empty_var_list and having body split up into function_list | non_empty_var_list function_list and none of the attempts would fix this issue. I'm new to this and I've run out of ideas of how to fix this, so I'm just completely baffled.

1

1 Answers

8
votes

the problem is in that variables and functions both start with TYPE and IDENTIFIER

Not exactly. The problem is that function_list is left-recursive and possibly empty.

When you reach the semi-colon terminating a variable with TYPE in the lookahead, the parser can reduce the variable into a variable_list, as per the first variable_list production. Now the next thing might be function_list, and function_list is allowed to be empty. So it could do an empty reduction to a function_list, which is what would be necessary to start parsing a function. It can't know not to do that until it looks at the '(' which is the third next token. That's far too far away to be relevant.

Here's an easy solution:

function_list: function function_list
             | /* EMPTY */
             ;

Another solution is to make function_list non-optional:

body: variable_list function_list
    | variable_list
    ;

function_list: function_list function
             | function
             ;

If you do that, bison can shift the TYPE token without having to decide whether it's the start of a variable or function definition.