YACC Grammar: Operator precedence issue

Question

I'm trying to get: (20 + (-3)) * 3 / (20 / 3) / 2 to equal 4. Right now it equals 17.

So basically it's doing (20/3) then dividing that by 2, then dividing 3 by [(20/3)/2], then multiplying that by 17. Not sure how to alter my grammar/rules/precedences to get it to read correctly. Any guidance would be appreciated, thanks.

%% 

start:              PROGRAMtoken IDtoken IStoken compoundstatement

compoundstatement:          BEGINtoken {print_header();} statement semi ENDtoken {print_end();}

semi:                                   SEMItoken statement semi
                                        |

statement:                              IDtoken EQtoken exp
                                        { regs[$1] = $3; }

                                        | PRINTtoken exp
                                        { cout << $2 << endl; }

                                        | declaration

declaration:                        VARtoken IDtoken comma

comma:                              COMMAtoken IDtoken comma
                                    |

exp:                                    exp PLUStoken term
                                        { $$ = $1 + $3; }

                                        | exp MINUStoken term
                                        { $$ = $1 - $3; }

                                        | term
                                        { $$ = $1; }

                                        | MINUStoken term
                                        { $$ = -$2;}

term:                                   factor
                                        { $$ = $1;
                                        }
                                        | factor TIMEStoken term
                                        {$$ = $1 * $3;
                                        }
                                        | factor DIVIDEtoken term
                                        { $$ = $1 / $3;
                                        }

factor:                                 ICONSTtoken
                                        { $$ = $1;}
                                        | IDtoken
                                        {  $$ = regs[$1];  }
                                        | LPARENtoken exp RPARENtoken
                                        { $$ = $2;}

%%

My tokens and types look like:

%token BEGINtoken   
%token COMMAtoken

%left  DIVIDEtoken
%left  TIMEStoken

%token ENDtoken
%token EOFtoken 
%token EQtoken

%token <value> ICONSTtoken
%token <value> IDtoken

%token IStoken
%token LPARENtoken

%left  PLUStoken MINUStoken

%token PRINTtoken
%token PROGRAMtoken
%token RPARENtoken
%token SEMItoken

%token VARtoken 

%type <value> exp
%type <value> term
%type <value> factor

Brian Tompsett - 汤莱恩 Brian Tompsett - 汤莱恩 · Accepted Answer · 2015-02-14T12:54:21

You really wanted someone to work hard to give you an answer, which is why the question has hung around for a year. Have a read of this help page: https://stackoverflow.com/help/how-to-ask, in particular the part about simplifying the problem. There are lots of rules in your grammar file that were not needed to reproduce the problem. We did not need:

%token BEGINtoken   
%token COMMAtoken
%token ENDtoken
%token EOFtoken 
%token EQtoken
%token <value> IDtoken
%token IStoken
%token PROGRAMtoken
%token VARtoken 
%%
start:              PROGRAMtoken IDtoken IStoken compoundstatement

compoundstatement:          BEGINtoken {print_header();} statement semi ENDtoken {print_end();}

semi:                                   SEMItoken statement semi
                                        |
                                        | declaration

declaration:                        VARtoken IDtoken comma

comma:                              COMMAtoken IDtoken comma
                                    |

You could have just removed these tokens and rules to get to the heart of the operator precedence issue. We did not need any variables, declarations, assignments or program structure to illustrate the failure. Learning to simplify is the heart of competent debugging and thus programming. If you'd done this simplification more people would have had a go at answering. I'm saying this not for the OP, but for those that will follow with similar problems!

I'm wondering what school is setting this assignment, as I've seen a fair number of yacc questions on SO around the same dumb problem. I suspect more will come here every year, so answering this will help them. I knew what the issue was on inspection of the grammar, but to test my solution I had to code up a working lexer, some symbol table routines, a main program and other ancillary code. Again, another deterrent for problem solvers.

Lets get to the heart of the problem. You have these token declarations:

%left  DIVIDEtoken
%left  TIMEStoken
%left  PLUStoken MINUStoken

These tell yacc that if any rules are ambiguous that the operators associate left. Your rules for these operators are:

exp:                                    exp PLUStoken term
                                        { $$ = $1 + $3; }

                                        | exp MINUStoken term
                                        { $$ = $1 - $3; }

                                        | term
                                        { $$ = $1; }

                                        | MINUStoken term
                                        { $$ = -$2;}

term:                                   factor
                                        { $$ = $1;
                                        }
                                        | factor TIMEStoken term
                                        {$$ = $1 * $3;
                                        }
                                        | factor DIVIDEtoken term
                                        { $$ = $1 / $3;
                                        }

However, these rules are not ambiguous, and thus the operator precedence declaration is not required. Yacc will follow the non-ambiguous grammar you have used. The ways these rules are written, it tells yacc that the operators have right associativity, which is the opposite of what you want. Now, it could be clearly seen from the simple arithmetic in your example that the operators were being calculated in a right associative way, and you wanted the opposite. There were really big clues there weren't there?

OK. How to change the associativity? One way would be to make the grammar ambiguous again so that the %left declaration is used, or just flip the rules around to invert the associativity. That's what I did:

exp:                                    term PLUStoken exp
                                        { $$ = $1 + $3; }

                                        | term MINUStoken exp
                                        { $$ = $1 - $3; }

                                        | term
                                        { $$ = $1; }

                                        | MINUStoken term
                                        { $$ = -$2;}

term:                                   factor
                                        { $$ = $1;
                                        }
                                        | term TIMEStoken factor
                                        {$$ = $1 * $3;
                                        }
                                        | term DIVIDEtoken factor
                                        { $$ = $1 / $3;
                                        }

Do you see what I did there? I rotated the grammar rule around the operator.

Now for some more disclaimers. I said this is a dumb exercise. Interpreting expressions on the fly is a poor use of the yacc tool, and not what happens in real compilers or interpreters. In a realistic implementation, a parse tree would be built and the value calculations would be performed during the tree walk. This would then enable the issues of undeclared variables to be resolved (which also occurs in this exercise). The use of the regs array to store values is also dumb, because there is clearly an ancillary symbol table in use to return a unique integer ID for the symbols. In a real compiler/interpreter those values would also be stored in that symbol table.

I hope this tutorial has helped further students understand these parsing issues in their classwork.

YACC Grammar: Operator precedence issue

1 Answers