0
votes

I've written a very simple parser in jison, but there seems to be an S/R conflict in this grammar:

/* lexical grammar */
%lex
%%

\s+                   /* skip whitespace */
":"                   return ':'
"."                   return '.'
[a-zA-Z_][a-zA-Z0-9_]* return 'IDENTIFIER'
<<EOF>>               return 'EOF'
.                     return 'INVALID'

/lex
/* operator associations and precedence */


%start expressions

%% /* language grammar */

expressions
    : statements EOF
        {return $1;}
    ;

statements: statements statement {$$ = [$1].concat($2);} | statement {$$ =
 [$1];};

statement:
    IDENTIFIER ":" grammar_and {[$$ = ["grammar_rule",$1,$3]]};

grammar_and:
    grammar_and IDENTIFIER {$$= [$1,$2]} | IDENTIFIER;

It is intended to parse documents like this one, which consists of 4 statements:

first: this is a sentence
second: this is another sentence
something: more words here another: more words here

but it doesn't work as I intended:

Conflicts encountered:
Resolve S/R conflict (shift by default.)
(1,10, 2,4) -> 1,10

Is it possible to resolve this conflict without modifying the syntax of the language that is being parsed?

1

1 Answers

1
votes

The grammar as written (and, indeed, any plausible grammar for that language) is LR(2), because it is impossible to know whether an IDENTIFIER is the continuation of a grammar_and or the start of a new statement until the following token is examined. In the second case, it would be necessary to reduce statement, and that decision cannot be made with the single token look-ahead.

This is a classic LR(2) grammar -- indeed, it is the natural grammar for bison productions -- and there are a number of ways to solve it.

The language itself is LR(1). In effect, there is no such thing as an LR(2) language because it is possible to mechanically produce an LR(1) grammar from any LR(k) grammar. An example of how to do this with essentially the same grammar is provided in the answer to How do I rewrite a context free grammar so that it is LR(1)?.

A somewhat simpler solution, which is similar to the one used by most yacc-alikes, is to add an extra lookahead in the lexical scanner (or with a shim between the scanner and the parser). An example of C code to do that is provided in the answer to flex&bison shift/reduce conflict. (The question is not identical, but I think the solution could be adapted.)