0
votes

I have the following grammar (Yacc) that is the beginning of a simple C compiler, i'm starting from a simple if statement:

S : E
  ;

E : COND_NO_ELSE
  ;

COND_NO_ELSE : IF BOOL_EXP BLOCK
;

BLOCK : LC EXP RC

BOOL_EXP : LP EXP BOOL_OP EXP RP
     ;

BOOL_OP : LT_OP
     | GT_OP
     | LE_OP
     | GE_OP
     | EQ_OP
     | NE_OP
     ;

MATH_OP : PLUS_OP
    ;

EXP : IDENTIFIER
    | EXP MATH_OP EXP
    ;

Here are the lexical analyzer scanner for the relevant rules:

"="       { yylval.string = strdup(yytext); return ASSIGN;}
"+"       { yylval.string = strdup(yytext); return PLUS_OP;}
"-"       { yylval.string = strdup(yytext); return MINUS_OP;}
"*"       { yylval.string = strdup(yytext); return MULTIPLY_OP;}
"/"       { yylval.string = strdup(yytext); return DIV_OP;}
"%"       { yylval.string = strdup(yytext); return MOD_OP;}

"<"       { yylval.string = strdup(yytext); return LT_OP;}
">"       { yylval.string = strdup(yytext); return GT_OP; }
"<="       { yylval.string = strdup(yytext); return LE_OP; }
">="       { yylval.string = strdup(yytext); return GE_OP; }
"=="       { yylval.string = strdup(yytext); return EQ_OP; }
"!="       { yylval.string = strdup(yytext); return NE_OP; }
"("       { yylval.string = strdup(yytext); return LP; }
")"       { yylval.string = strdup(yytext); return RP; }
"{"       { yylval.string = strdup(yytext); return LC; }
"}"       { yylval.string = strdup(yytext); return RC; }
if { return IF; }

I do know that the conflict started when i added the MATH_OP (i had all math operators than and got 5 conflicts, than removed all but the PLUS_OP and got 1 shift/reduce conflict).

i used the -v flag for the output file as suggested here, and checked this question but it doesn't resemble my grammar too much...

How can i find the conflict?

1

1 Answers

2
votes

Your grammar includes the production:

EXP : EXP MATH_OP EXP

which is inherently ambiguous. Suppose you have two operators:

1 + 2 * 3

Obviously, the above is EXP MATH_OP EXP (because there is no other option), but is that

[EXP: 1 + 2] [MATH_OP *] [EXP: 3]

or is it

[EXP: 1] [MATH_OP +] [EXP: 2 * 3]

Those two parses obviously have different semantics, and the grammar allows both.

Even if you have a single operator, there is an ambiguity (in fact, the same ambiguity), although it happens that with the usual definition of + the evaluation will be the same. (It would be different with the single operator -, which makes the ambiguity slightly clearer.)

There are two normal strategies for creating a yacc/bison grammar for arithmetic expressions:

  1. Explicitly indicate how the expressions are to be parsed. (Example from Wikipedia).

  2. Use precedence declarations to implicitly restrict the parse. (Example from bison manual.)

The first strategy is wordier but more flexible; it is probably better if you are trying to learn how LR parsing works, because it is explicit. The second strategy is more compact and arguably easier to read (because many casual readers understand lists of operator precedence better than context-free grammars) but understanding how it works in detail is a bit more work.

In neither case can you just lump all operators into a single non-terminal like MATH_OP, because operators with different precedences are syntactically different.

You'll also find lots of relevant questions and answers in this site.


By the way, it is very rarely useful to pass the string value of a purely syntactic token like + to the parser. The parser has no need for that value because it already knows the token type, so the strdup() is unnecessary overhead and the corresponding free() clutters the grammar actions (also, it is not obvious where to put it). If you think you need the strings in order to trace the grammar actions, check out bison's debug facilities which are much easier to use and more reliable than sprinkling printfs throughout your parser. If you're not using the semantic value for the operators at all, then you obviously don't need to go to the trouble of duplicating a string and then freeing or leaking the memory.