0
votes

I've been stuck with some ambiguous grammar for a while now as yacc reports 6 shift/reduce conflicts. I've looked in the y.output file and have tried to understand how to look at the states and figure out what to do to fix the ambiguous grammar but to no avail. I'm legitimately stuck at how I'm supposed to fix the issues. I've looked at a lot of questions on stack overflow to see if other people's explanation would help me with my problem, but that hasn't helped me much either. For the record, I cannot use any precedence defining directives such as %left to solve the parsing conflicts.

Would someone be able to help me out by guiding me as to how I should change the grammar to fix the shift/reduce conflicts? Maybe by trying to resolve one of the issues and showing me the thinking process behind it? I know the grammar is quite long and hefty and I apologize in advance for that. If anyone is willing to spare their free time on this it would be greatly appreciated, but I realize that I may not be able to have that.

Anyways, here is my grammar in question (it is a slight expansion of the MiniJava grammar):

Grammar
0 $accept: program $end

1 program: main_class class_decl_list

2 main_class: CLASS ID '{' PUBLIC STATIC VOID MAIN '(' STRING '[' ']' ID ')' '{' statement '}' '}'

3 class_decl_list: class_decl_list class_decl
4                | %empty

5 class_decl: CLASS ID '{' var_decl_list method_decl_list '}'
6           | CLASS ID EXTENDS ID '{' var_decl_list method_decl_list '}'

7 var_decl_list: var_decl_list var_decl
8              | %empty

9 method_decl_list: method_decl_list method_decl
10                 | %empty

11 var_decl: type ID ';'

12 method_decl: PUBLIC type ID '(' formal_list ')' '{' var_decl_list statement_list RETURN exp ';' '}'

13 formal_list: type ID formal_rest_list
14            | %empty

15 formal_rest_list: formal_rest_list formal_rest
16                 | %empty

17 formal_rest: ',' type ID

18 type: INT
19     | BOOLEAN
20     | ID
21     | type '[' ']'

22 statement: '{' statement_list '}'
23          | IF '(' exp ')' statement ELSE statement
24          | WHILE '(' exp ')' statement
25          | SOUT '(' exp ')' ';'
26          | SOUT '(' STRING_LITERAL ')' ';'
27          | ID '=' exp ';'
28          | ID index '=' exp ';'
29 statement_list: statement_list statement
30               | %empty

31 index: '[' exp ']'
32      | index '[' exp ']'

33 exp: exp OP exp
34    | '!' exp
35    | '+' exp
36    | '-' exp
37    | '(' exp ')'
38    | ID index
39    | ID '.' LENGTH
40    | ID index '.' LENGTH
41    | INTEGER_LITERAL
42    | TRUE
43    | FALSE
44    | object
45    | object '.' ID '(' exp_list ')'

46 object: ID
47       | THIS
48       | NEW ID '(' ')'
49       | NEW type index

50 exp_list: exp exp_rest_list
51         | %empty

52 exp_rest_list: exp_rest_list exp_rest
53              | %empty
54 exp_rest: ',' exp

And here are the relevant states from y.output that have shift/reduce conflicts.

State 58
7 var_decl_list: var_decl_list . var_decl
12 method_decl: PUBLIC type ID '(' formal_list ')' '{' var_decl_list . statement_list RETURN exp ';' '}'

INT      shift, and go to state 20
BOOLEAN  shift, and go to state 21
ID       shift, and go to state 22

ID        [reduce using rule 30 (statement_list)]
$default  reduce using rule 30 (statement_list)

var_decl        go to state 24
type            go to state 25
statement_list  go to state 69


State 76
38 exp: ID . index
39    | ID . '.' LENGTH
40    | ID . index '.' LENGTH
46 object: ID .

'['  shift, and go to state 64
'.'  shift, and go to state 97

'.'       [reduce using rule 46 (object)]
$default  reduce using rule 46 (object)

index  go to state 98


State 100
33 exp: exp . OP exp
34    | '!' exp .

OP  shift, and go to state 103

OP        [reduce using rule 34 (exp)]
$default  reduce using rule 34 (exp)


State 101
33 exp: exp . OP exp
35    | '+' exp .

OP  shift, and go to state 103

OP        [reduce using rule 35 (exp)]
$default  reduce using rule 35 (exp)


State 102
33 exp: exp . OP exp
36    | '-' exp .

OP  shift, and go to state 103

OP        [reduce using rule 36 (exp)]
$default  reduce using rule 36 (exp)


State 120
33 exp: exp . OP exp
33    | exp OP exp .

OP  shift, and go to state 103

OP        [reduce using rule 33 (exp)]
$default  reduce using rule 33 (exp)

And there we have it. I apologize again for the length of this grammar and the number of shift/reduce conflicts. I just cannot seem to understand how to fix them by changing the grammar in question. Any help would be thoroughly appreciated, though if no one has time to look through such a massive post, I would understand. If anyone needs more information, don't hesitate to ask.

1

1 Answers

2
votes

The basic problem is that when parsing a method_decl body, it can't tell where the var_decl_list ends and the statement_list begins. This is because when the lookahead is ID, it doesn't know whether that is the start of another var_decl or the start of the first statement, and it needs to reduce an empty statement before it can start working on a statement_list.

There are a number of ways you can deal with this:

  • have the lexer return different tokens for type IDs and other IDs -- that way the difference will tell the parser which is next.

  • don't require an empty statement at the start of a statement list. Change the grammar to:

    statement_list: statement | statement_list statement ;
    opt_statement_list: statement_list | %empty ;
    

    and use opt_statement_list in the method_decl rule. This gets around the problem of having to reduce an empty statement_list before you start parsing statements. This is a process known as "unfactoring" the grammar as you are replacing rules with multiple variations. It makes the grammar more complex, and in this case, doesn't solve the problem, it just moves it; you'll then see shift/reduce conflicts betweeen statement: ID . index and type: ID on a [ lookahead. This problem can also be solved by unfactoring, but is harder.


So this brings up the general idea of resolving shift-reduce conflicts by unfactoring. The basic idea is to get rid of the rule causing the reduce half of the shift reduce conflict, replacing it with rules that are more limited in context, so don't trigger the conflict. The example above is easily solved by the "replace a 0-or-more recursive repeat with a 1-or-more recursive repeat and an optional rule". This works well for shift-reduce conflicts on the epsilon rule of the repeat if the following context means you can easily resolve when the 0-case should be legal (only when the next token is } in this case.)

The second conflict is tougher. Here the conflict is on reducing type: ID when the lookahead is [. So we need to duplicate type rules until that is not necessary. Something like:

type: simpleType | arrayType ;
simpleType: INT | BOOLEAN | ID ;
arrayType: INT '[' ']' | BOOLEAN '[' ']' | ID '[' ']'
         | arrayType '[' ']' ;

replaces the "0 or more repetitions of the '[' ']' suffix" with "1 or more" and works for similar reasons (defers the reduction until after seeing the '[' ']' instead of requiring it before.) The key being that the simpleType: ID rule never needs to be reduced when the lookahead is '[' as it is only valid in other contexts.