0
votes

I am writing a parser using Bison, but can't seem to get the grammar correct.

There are two conflicts:

Here are some of the rules used around conflict one:

program                 :           function END_OF_FILE {return 0;}
formal_parameters       :           OPEN_PAREN formal_parameter list_E_fparameter CLOSE_PAREN | OPEN_PAREN CLOSE_PAREN
formal_parameter        :           expression_parameter | function_parameter
function                :           return_options IDENTIFIER formal_parameters block
function_parameter      :           return_options IDENTIFIER formal_parameters
expression_parameter    :           VAR identifier_list IDENTIFIER | identifier_list IDENTIFIER
variable_creation       :           identifier_list COLON type SEMI_COLON
labels                  :           LABELS identifier_list SEMI_COLON
list_E_identifiers      :           list_E_identifiers COMMA IDENTIFIER |  
identifier_list         :           IDENTIFIER list_E_identifiers
return_options          :           VOID | IDENTIFIER

State 12 conflicts: 1 reduce/reduce

state 12

   56 identifier_list: IDENTIFIER . list_E_identifiers
   60 return_options: IDENTIFIER .
  102 list_E_identifiers: . list_E_identifiers COMMA IDENTIFIER
  103                   | .

    COMMA       reduce using rule 103 (list_E_identifiers)
    IDENTIFIER  reduce using rule 60 (return_options)
    IDENTIFIER  [reduce using rule 103 (list_E_identifiers)]
    $default    reduce using rule 60 (return_options)

    list_E_identifiers  go to state 23

State 64 conflicts: 1 shift/reduce

state 64

    8 body: OPEN_BRACE list_E_statement . CLOSE_BRACE
   17 statement: . opt_declaration unlabeled_statement
   18          | . compound
   31 compound: . OPEN_BRACE list_NE_unlstatement CLOSE_BRACE
   73 opt_declaration: . IDENTIFIER COLON
   74                | .
   94 list_E_statement: list_E_statement . statement

    CLOSE_BRACE  shift, and go to state 68
    IDENTIFIER   shift, and go to state 69
    OPEN_BRACE   shift, and go to state 70

    IDENTIFIER  [reduce using rule 74 (opt_declaration)]
    $default    reduce using rule 74 (opt_declaration)

    statement        go to state 71
    compound         go to state 72
    opt_declaration  go to state 73

Can anyone help me? I've looked at http://www.gnu.org/software/bison/manual/html_node/Understanding.html but can't understand what this means.

I can post the full grammar if that would help.

Thank you!

1
Is the question "what is a shift/reduce or reduce/reduce conflict?," or "why am I getting these in my grammar?" And yes, you should probably post the grammar here so that we can help out.templatetypedef
Conflicts do not mean that your grammar isn't "correct". The Awk utility was developed in the birthplace of Yacc, yet it's Yacc grammar has over ninety conflicts. The conflicts are resolved automatically; for instance shift/reduce is resolved for you by favoring shifting over reducing. Yacc even has an %expect N directive which tells it to expect a certain number of conflicts and not complain.Kaz
@templatetypedef I am sorry if the question is not correct, I would want either for some one help me find the conflict, or at least help me understand what this debugging output mean so that I can then find the conflict.luisforque
@luisforque: There is no need to split out the "rest" of a list. Just write identifier_list: identifier | identifier_list ',' identifier or parameter_list: formal_parameter | parameter_list ',' formal_parameter, etc. You don't need to use empty productions everywhere and sometimes they'll get you into trouble. (Very occasionally, you do need to specify the prefix without a reduction. But that's very rare.)rici
@rici In the case of parameter_list and identifier_list that you wrote, those cases cannot be 0, correct? In my rule, for example, I need a list of identifiers that have 0 or more elements. That is why I made this way. In your solution, the element need to be 1 or more, correct?luisforque

1 Answers

0
votes

The second conflict is a classic problem with "optional" elements. It's very tempting to write a rule for optional labels as you have done it, but the fact that optional_label could produce nothing forces the parser to try to make a decision before it has enough information.

LR parsers must "reduce" (recognize) a non-terminal before absorbing any further tokens. They can lookahead at the next k tokens (the next 1 token for an LR(1) parser, which is what bison generates), but they can't tentatively use the token and later go back and do the reduction.

So when the parser is at the point where the next token, which is an identifier, should start a statement, it might be looking at a statement which starts with an identifier, or it might be looking at a label which starts with an identifier. It could tell the difference by looking at the colon which follows the identifier (if any) but it can't see that far ahead.

Now, if it weren't for the fact that it is required to reduce either an empty optional_declaration or one containing a label, there would not be a problem. If you had written something like this:

statement: basic_statement | compound
basic_statement: unlabeled_statement | declaration unlabeled_statement
declaration: IDENTIFIER COLON

then the parser would not have to make a decision when it sees the identifier. It only has to make a decision when it reaches the end of a production; it is perfectly capable of pressing forward when there are two possible productions to complete. But when you force it to recognize an optional label, then it has to know whether the label was not there in order to reduce (recognize) the empty production.

For the first conflict, we can see from the output that there is some context in which the lookahead symbol is IDENTIFIER and you could have either a return_options or an identifier_list. Since both of those productions can produce a single IDENTIFIER, the parser will not know which one to reduce.

With the actual grammar available, it is easy to find the context in which both return_options IDENTIFIER and identifier_list IDENTIFIER are possible:

formal_parameter    : expression_parameter | function_parameter
expression_parameter: identifier_list IDENTIFIER
function_parameter  : return_options  IDENTIFIER …

That grammar is not ambiguous. If IDENTIFIER IDENTIFIER is the start of function_parameter, then it must be followed by a (; if it is an expression_parameter, then it must be followed by either , or ). But that's the second next token, which means you'd need an LR(2) parser.

So I will give my usual advice on handling LR(2) grammars. It is possible to rewrite an LR(k) grammar as an LR(1) grammar, regardless of the value of k, but the result is usually bloated and ugly. So if you are using bison and you are willing to live with the possibility that action evaluations could be slightly delayed, then your easiest solution is to ask bison to generate a GLR parser. Often, just adding %glr-parser to the options section is enough.

It's worth noting that your grammar seems to be an uneasy mix between C and Pascal-like syntaxes. In C, the first token in a parameter is always a type; either the return type of a function, or the type of the following identifier. In Pascal, the last token in a parameter is the type. But in your grammar, sometimes the first token is the type and sometimes it's the last token. In a certain sense, it is this inconsistency which leads to the awkwardness in the grammar.

(Pascal has a lot more punctuation: the type is always preceded by a colon and a function parameter is preceded by the word function. These extra tokens are not needed to make the grammar work, but it can be argued that they make the syntax easier to read by humans.)