How does the yacc/bison LALR(1) algorithm treat “empty” rules?

Question

In a LALR(1) parser, the rules in the grammar are converted into a parse table that effectively says "If you have this input so far, and the lookahead token is X, then shift to state Y, or reduce by rule R".

I have successfully constructed a LALR(1) parser in an interpreted language (ruby), not using a generator, but computing a parse table at runtime and evaluating the input using that parse table. This works surprisingly well and the table generation is quite trivial (which surprised me somewhat), supporting self-referential rules and left/right association.

One thing I am having some difficulty to understand, however, is how yacc/bison conceptually processes empty rule definitions. My parser can't handle them, since in generating the table it looks at each symbol in each rule, recursively, and "empty" is just not something that will come from the lexer, nor be reduced by a rule. So how then, do LALR(1) parsers process the empty rule? Do they treat it specially, or is it a "natural" concept that a valid algorithm should just work with, without even needing to have particular awareness of such a concept?

Let's say, a rule that can match any number of paired parentheses with nothing in the middle:

expr:   /* empty */
      | '(' expr ')'
      ;

Input like the following would match this rule:

((((()))))

This means that upon reading '(' and seeing ')' in the lookahead token, the parser choices:

Shift the ')' (not possible)
Reduce the input according to some other rule (not possible)
Something else...

don't quite fit into the core algorithm of "shift" or "reduce". The parser effectively needs to shift nothing onto the stack, reduce "nothing" to expr, then shift the next token ')', giving '(' expr ')', which of course reduces to expr, and so on.

It's the "shift nothing" that's confusing me. How does the parse table convey such a concept? Consider also that it should be possible to invoke some semantic action that returns a value to $$ on reducing the empty value, so a rather simplistic view of just skipping that from the parse table and saying that '(' on the stack and ')' in the lookahead should simply translate to a shift, would not genuinely produce the sequence '(' expr ')', but would simply produce the sequence '(' ')'.

I'm sure there's long sections in the dragon book about dealing with such rules. I don't think Stack Overflow is the right venue to discuss it though - maybe programmers? — Damien_The_Unbeliever
Thanks for the suggestion on the book... looking at that link now. Stackoverflow seems the right place to me. It's a direct question about the algorithm, not a subjective discussion. Somebody could very well search for this and, if anybody knows the answer, get a quick solution. — d11wtq
I think I actually just figured this out when a fundamental point dawned on me, and it's so blindly obvious and straightforward. Will answer question, since Googling turns up nothing and it's a fairly natural question ;) — d11wtq

d11wtq d11wtq · Accepted Answer · 2011-11-23T13:43:07

Despite thinking about this for days, since thinking this through when writing the question and in the minutes that followed, something just hit me as so incredibly obvious and simple.

Reduction by all rules is always: pop off X inputs from the stack, where X is the number of components in the rule, then shift the result back onto the stack and goto whatever state is given in the table after that reduction.

In the case of the empty rule, you don't need to consider that "empty" is even a concept. The parse table simply needs to include a transition that says "given '(' on the stack and 'anything that is not '(' in the lookahead, reduce by the 'empty' rule". Now since the empty rule has a size of zero components, popping zero from the stack means the stack doesn't change, then when the result of reducing nothing is shifted onto the stack, you're looking at something that does indeed appear in the grammar and everything becomes clear.

Stack       Lookahead    Remaining Input      Action
--------------------------------------------------------------
$           (            ())$                 Shift '('
$(          (            ))$                  Shift '('
$((         )            )$                   Reduce by /* empty */
$((expr     )            )$                   Shift ')'
$((expr)    )            $                    Reduce by '(' expr ')'
$(expr      )            $                    Shift ')'
$(expr)     $                                 Reduce by '(' expr ')'
$expr                                         Accept

The reason it "just works" is because in order to reduce by the empty rule you only have to pop zero items from the stack.

How does the yacc/bison LALR(1) algorithm treat “empty” rules?

2 Answers