Yacc parse generator error

Question

I am pretty sure I have conflicting YACC rules (specifically the exp exp and group_open exp group_close rules). I am trying to build a simple boolean query syntax that lets people do stuff like a "b c" -(d or e) which would rougly be equivalent to a AND "b c" AND NOT (d OR e).

However I am having trouble implemnting both the group rule () and the AND rule (basically just spaces).

%lex
%%

\s+          ;
or|OR        return 'or';
and|AND      return 'and';
\"[^\"]+\"   return 'phrase';
"-"\b        return 'not';
"("          return 'group_open';
")"          return 'group_close';
[^\s,]+      return 'word';

/lex

%token space
%token phrase
%token group_open
%token group_close
%token word
%left or
%left and
%left not

%%

query  : exp                         { return $1; }
       ;

exp    : term
       | exp or exp                  { $$ = $1+" OR "+$3; }
       | exp and exp                 { $$ = $1+" AND "+$3; }

       /* this is the one that is casuing me issues */
       | exp exp                     { $$ = $1+" AND "+$3; }

       | not exp                     { $$ = "NOT "+$2; }
       | group_open exp group_close  { $$ = "("+$2+")"; }
       ;

term   : phrase                      { $$ = "PHRASE"; }
       | word                        { $$ = "WORD"; }
       ;

Any help would be great.

I am testing my grammar by using jison.org

Below are the errors I am getting

Conflicts encountered:
Resolve S/R conflict (shift by default.)
(1,8, 2,5) -> 1,8Resolve S/R conflict (shift by default.)
(1,9, 2,5) -> 1,9Resolve S/R conflict (shift by default.)
(1,6, 2,5) -> 1,6Resolve S/R conflict (shift by default.)
(1,7, 2,5) -> 1,7Resolve S/R conflict (shift by default.)
(1,4, 2,5) -> 1,4Resolve S/R conflict (shift by default.)
(1,5, 2,5) -> 1,5Resolve S/R conflict (shift by default.)
(1,6, 2,6) -> 1,6Resolve S/R conflict (shift by default.)
(1,7, 2,6) -> 1,7Resolve S/R conflict (shift by default.)
(1,5, 2,6) -> 1,5Resolve S/R conflict (shift by default.)
(1,6, 2,3) -> 1,6Resolve S/R conflict (shift by default.)
(1,7, 2,3) -> 1,7Resolve S/R conflict (shift by default.)
(1,5, 2,3) -> 1,5Resolve S/R conflict (shift by default.)
(1,6, 2,4) -> 1,6Resolve S/R conflict (shift by default.)
(1,7, 2,4) -> 1,7Resolve S/R conflict (shift by default.)
(1,5, 2,4) -> 1,5

Chris Dodd Chris Dodd · Accepted Answer · 2014-06-13T23:19:31

Using precedence rules to resolve this conflict really requires understanding the details of how LR parsing works, and how yacc precendece levels are used to resolve shift/reduce conflicts.

Ambiguities in an expression grammar like yours manifest as shift/reduce conflicts where the parser does not know whether to reduce a rule for an operator that it has parsed or shift a token that might lead to some higher precedence operation. If the rule that the shift leads to is higher precedence, then it should be shifted, but sometimes it is tough to know what rule the token will lead to.

In your example, having recognized the RHS of some rule that ends with exp, and looking at a lookahead token that can be the start of another exp, it needs to reduce if the rule seen is higher precedence than an exp exp expression, and shift otherwise. So you need to set the precedence of every token that might start an expression as just lower than the precedence of the exp exp rule (assuming you want left associativity), and higher than other lower precedence things:

%left or
%nonassoc phrase word group_open not
%left and
%left UNARY

%%

query  : exp                         { return $1; }
       ;

exp    : term
       | exp or exp                  { $$ = $1+" OR "+$3; }
       | exp and exp                 { $$ = $1+" AND "+$3; }
       | exp exp %prec and           { $$ = $1+" AND "+$2; }
       | not exp %prec UNARY         { $$ = "NOT "+$2; }
       | group_open exp group_close  { $$ = "("+$2+")"; }
       ;

term   : phrase                      { $$ = "PHRASE"; }
       | word                        { $$ = "WORD"; }
       ;

Note that not may start a expression, so needs to have lower precedence than exp exp, so we introduce a new fake token UNARY that will never be returned by the lexer; it exists soly to give higher precedence to the not exp rule with the %prec UNARY directive. Also, the exp exp rule needs an explicit %prec directive to give it a precedence level (by default rules get the precedence of the first token on the RHS, but exp exp has no tokens on the RHS).

The above rules make the precedence of exp exp and exp and exp the same and left associative. That means 'a b and c' will be parsed as '(a b) and c', and 'a and b c' will be parsed as '(a and b) c'. If you instead want exp exp to be higher precedence than exp and exp, you need to create another fake token with higher precedence than and and use that for the precedence of exp exp, moving the %nonassoc up to be just below that as well.

Alternately, you can avoid yacc precedence rules altogether, and instead rewrite your grammar with multiple exp rules, one for each precedence level:

query  : exp1          { return $1; } ;
exp1   : exp1 or exp2  { $$ = $1+" OR "+$3; }
       | exp2 ;
exp2   : exp2 and exp3 { $$ = $1+" AND "+$3; }
       | exp2 exp3     { $$ = $1+" AND "+$2; }
       | exp3 ;
exp3   : not exp3      { $$ = "NOT "+$2; }
       | '(' exp1 ')'  { $$ = "("+$2+")"; }
       | phrase        { $$ = "PHRASE"; }
       | word          { $$ = "WORD"; } ;

Yacc parse generator error

1 Answers