6
votes

I'm trying to learn basics of the Lemon parser generator, but I stuck quickly.

Here's a tiny grammar:

%right PLUS_PLUS.
%left DOT.

program ::= expr.

member_expr ::= expr DOT IDENTIFIER.

lhs_expr ::= member_expr.

expr ::= lhs_expr.
expr ::= PLUS_PLUS lhs_expr.

It causes 1 parsing conflict:

State 3:
      (3) expr ::= lhs_expr *
      (4) expr ::= PLUS_PLUS lhs_expr *

                           DOT reduce       3      expr ::= lhs_expr
                           DOT reduce       4       ** Parsing conflict **
                     {default} reduce       4      expr ::= PLUS_PLUS lhs_expr

Whereas, if I rewrite the last rule as follows:

expr ::= PLUS_PLUS expr DOT IDENTIFIER.

Then it causes no conflicts. But I don't think that this is the right way to go.

I'd appreciate if someone could explain what's the right way, and why.

1
I suggest you think of this without the extra names for things. Everywhere you have lhs_expr you could simply write expr DOT IDENTIFIER to see clearly what is really being asked for. If everything is in terms of expr you can see the conflict more clearly.Alan Mimms
Thing is, lhs_expr can be something other than expr DOT IDENTIFIER. This particular grammar does not contain other rules for it, because I wanted it to be very minimal, but it can also be e.g. IDENTIFIER, or expr LBRACKET expr RBRACKET, etc. And I want to write the rule PLUS_PLUS lhs_expr just once, and cover all possible left-hand side expressions being pre-incremented.Dmitry Frank

1 Answers

5
votes

So you wrote an ambiguous grammar, which says to accept:

 ++ x . y

with two interpretations:

 [++ x ] . y

and

 ++ [x . y]

where the [ ] are just my way to show to the groupings.

Lemon is an L(AL)R parser, and such parsers simply don't handle ambiguities (multiple interpretations). The reduce-reduce conflict reported is what happens when the parser hits that middle dot; does it group "++ x" as "[++ x] ." or as "++ [ x .]"? Both choices are valid, and it can't choose safely.

If you stick with Lemon (or another LALR parser generator), you have to get rid of the problem by changing the grammar. [You could use a GLR parser generator; it would accept and give you both parses. But all you've done then is to push the problem of resolving the ambiguity to the post-parsing phrase. As you don't want an ambiguity, you might as well avoid it during parsing if you can. In this case I think you can.]

I think you are trying to build a C-like language. So you want something like this:

primitive_target ::= IDENTIFIER ;
primitive_target ::= IDENTIFIER '[' expr ']' ;
access_path ::= primitive_target ;
access_path ::= access_path '.' primitive_target ;

lhs ::= access_path ;
lhs ::= PLUS_PLUS access_path ;
lhs ::= access_path PLUS_PLUS ;

program ::= expr ;

expr ::= term ;
expr ::= expr '+' term ;
term :::= '(' expr ')' ;
term ::= lhs ;
term ::= lhs '=' expr ;
term ::= constant ;