0
votes

I'm trying describe this syntax on Bison input file: https://courses.engr.illinois.edu/cs421/sp2011/mps/mp2/minijavasyntax.pdf

this is my input:

%start Program
%token KW_CLASS KW_EXTENDS KW_PUBLIC KW_STATIC KW_BOOLEAN KW_STRING KW_FLOAT KW_INT EOF
%token KW_IF KW_WHILE KW_BREAK KW_CONTINUE KW_SWITCH KW_CASE KW_DEFAULT KW_RETURN
%token KW_NEW KW_THIS KW_NULL KW_TRUE KW_FALSE KW_PRINTLN
%token IDENT INT_LITERAL FLOAT_LITERAL STRING_LITERAL

%nonassoc "THEN"
%nonassoc KW_ELSE

%right "STATEMENTS"

%right OP_ASSIGN
%left OP_OR
%left OP_AND
%nonassoc CMP_EQ CMP_NEQ
%nonassoc CMP_GT CMP_LT CMP_GTE CMP_LTE
%left OP_ADD OP_MINUS
%left OP_MULT OP_DIV OP_MOD
%right OP_NOT OP_UNARY "NEW" 
%left "FUNCALL" "SUBSCRIPT"  '.'
%nonassoc '('
%nonassoc ')'
%%
Program:        ClassDeclp EOF
            ;

ClassDeclp:     ClassDecl
            | ClassDeclp ClassDecl
            ;
ClassDecl:      KW_CLASS IDENT ExtendsFrom
            '{' VarDecls MethodDecls '}'
            ;
ExtendsFrom:        /*empty*/
                | KW_EXTENDS IDENT
            ;

VarDecls:       /*empty*/
            | VarDecls VarDecl
            ;
VarDecl:        Type IDENT ';'
            | KW_STATIC Type IDENT ';' /*Co the sua thanh AcessModifier Type IDENT*/
            ;

MethodDecls:        /*empty*/
            | MethodDecls MethodDecl
            ;
MethodDecl:     KW_PUBLIC Type IDENT
            '('MethodParams')'
            '{'VarDecls Statements KW_RETURN Expression ';' '}'
            ;

MethodParams:       /*empty*/
            | MethodParams ',' MethodParam
            ;       
MethodParam:        Type IDENT;

Type :          Type '['']'
                | KW_BOOLEAN
            | KW_STRING
            | KW_FLOAT
            | KW_INT
            | IDENT
            ;
Statements:     Statements Statement    %prec "STATEMENTS"
            | /*empty*/ %prec "STATEMENT"
            ;
Statementp:     Statements Statement %prec "STATEMENTS"
            ;
Statement:      '{'Statements'}'
            | KW_IF '(' Expression ')' Statement %prec "THEN"
            | KW_IF '(' Expression ')' Statement KW_ELSE Statement
            | KW_WHILE '(' Expression ')'Statement
            | KW_PRINTLN '(' Expression ')' ';'
            | IDENT OP_ASSIGN Expression ';'
            | KW_BREAK ';'
            | KW_CONTINUE ';'
            | IDENT %prec "SUBSCRIPT" '['Expression']' '=' Expression ';'
            | KW_SWITCH '(' Expression ')' '{'
              Cases
              KW_DEFAULT ':' Statementp '}'
            ;

Cases:          Cases Case
            | /*empty*/
            ;
Case:           KW_CASE INT_LITERAL ':' Statementp
            ;


Expression:     Expression OP_OR Expression
            | Expression OP_AND Expression
            | Expression CMP_EQ Expression
            | Expression CMP_NEQ Expression
            | Expression CMP_GT Expression
            | Expression CMP_GTE Expression
            | Expression CMP_LT Expression
            | Expression CMP_LTE Expression
            | Expression OP_ADD Expression
            | Expression OP_MINUS Expression
            | Expression OP_MULT Expression
            | Expression OP_DIV Expression
            | Expression OP_MOD Expression
            | '-' Expression %prec OP_UNARY
            | OP_NOT Expression
            | Expression %prec "SUBSCRIPT" '['Expression']'
            | Expression '.'"length"
            | Expression '.' IDENT %prec "FUNCALL" '(' ParamList ')' 
            | INT_LITERAL
            | FLOAT_LITERAL
            | STRING_LITERAL
            | KW_NULL
            | KW_TRUE
            | KW_FALSE
            | IDENT
            | KW_THIS
            | KW_NEW Type '[' Expression ']' %prec "NEW"
            | KW_NEW IDENT '('')'        %prec "NEW"
            | '(' Expression ')'
            ;


ParamList:      /*empty*/
            | ParamList ',' Expression
            | Expression
            ;
%%
main(int argc, char** argv[])
{
    extern FILE *yyin;
    ++argv; --argc;
    yyin = fopen(argv[0], "r");
    yydebug = 1;
    errors = 0;
    yyparse();
}

yyerror(char *s)
{
    printf("%s\n", s);
}

/* Co 3 conflict RR can xu ly khi bien thuoc kieu bool
   giua BoolExpr va Expresstion */

I got two 16 conflicts when compile. One of conflicts, I ran Bison with --report=lookahead:

OP_NOT Expression . [OP_OR, OP_AND, CMP_EQ, CMP_NEQ, CMP_GT, CMP_LT, CMP_GTE, CMP_LTE, OP_ADD, OP_MINUS, OP_MULT, OP_DIV, OP_MOD, ')', ';', ',', '[', ']']

What I expected is '[' not in OP_NOT lookahead token, because SUBSCRIPT precedence must higher than Operator !. Others conflicts are like this. How can I solve it. Tks

1

1 Answers

2
votes

That's not how precedence works.

EDIT: If you find the following description confusing or you don't want to wade through so much English text, you could take my usual advice: Don't use precedence. You can almost always write an unambiguous grammar which doesn't require precedence declarations. And if you do that, you don't need to understand precedence. (Although, honestly, it is not that complicated if you understand how LR parsing works.) /EDIT

Precedence always compares a possible reduction (i.e. a production whose rightt-hand side matches the top of the current parser stack) and the lookahead symbol.

At the point:

Expression : Expression · '[' Expression ']'

the only possible parser action is a shift, because a reduction can only happen when the point is at the end of the right-hand side.

However, in one of the states in which that point occurs, there is another production:

Expression : OP_NOT Expression ·

This one can be reduced, since the point is at the end.

Since both these points are in the same state, they must both be valid. That means that we are looking at:

OP_NOT Expression · '[' Expression ']'

And we are trying to figure out what to do. We could reduce OP_NOT Expression to Expression, at which point we would have:

Expression · '[' Expression ']'

Or we could shift the '[', leaving us with

OP_NOT Expression '[' · Expression ']'

Since both of these are possible, there is a shift/reduce conflict. Yacc/Bison will try to resolve that conflict using a precedence rule if one exists. In particular, it needs to compare the precedence of the production which might be reduced:

Expression : OP_NOT Expression

and the symbol which might be shifted: '['.

However, a thorough look through the precedence declarations shows that there is no precedence assigned to '['. So yacc/bison cannot test it against the production (whose precedence is defined by the last terminal in the right-hand side, OP_NOT, since there is no %prec declaration.

If you want the postfix subscript operator ([ Expression ']') to have higher precedence than the prefix operator OP_NOT, you have to declare a precedence for [ which is higher than OP_NOT.

By the way, I don't see the point of the inconsistency here. You could have used ! for OP_NOT (and - for OP_MINUS and so on), which would be easier to read and less work.

You seem to think that the %prec declaration in

Expression %prec "SUBSCRIPT" '['Expression']'

is relevant. It is not. It would only be applicable if the parser could possible reduce Expression '[' Expression ']'. But it is also pointless, because you don't need to create a fake terminal to hold the precedence of that production; it's precedence is defined by the last terminal on the right-hand side, ']', so you could just declare a precedence for that terminal.

The fake token in the declaration Expression : OP_MINUS Expression %prec OP_UNARY is needed because '-' has two different precedences, or more accurately because OP_MINUS Expression has different precedence from Expresson OP_MINUS Expression. You didn't actually need to invent a fake token, though; you could use any token with the correct precedence, such as OP_NOT or OP_NEW.

In case that wasn't enough words, I have tried to explain this in several different SO answers. Here's one and here's another one and here's another one. Also, here's one by Chris Dodd and here's the documentation in the bison manual. If you're lucky, you might find a description in your own language, by using whatever internet search engine works best for you or by talking with your professor if you have one.

By the way, the lookahead report tells you which symbols might follow a given production. Precedence has no effect on this. [ can definitely follow !a. Precedence tells the parser whether to reduce or shift, but the token which might trigger the reduce or shift is definitely in the lookahead set.