1
votes

I have worked with a few parsers (Yacc, Bison and Menhir). If I remember correctly, all of them allow for a rule to be empty. Here is an example of what I mean using Menhir, it is the one I have used the most.

some_list:
 | {[]}
 | some_non_empty_list { $1 }

some_non_empty_list:
 | SEMICOLON some_list { $2 }
 | element { [$1] }
 | element some_non_empty_list { $1 :: $2 }

The important part is that some_list that can reduce on nothingness.

My current understanding of the algorithm to build a parsing table (build NFA, build DFA from the NFA, minimize) leads me to think that this would lead to shift/reduce conflicts all over the place. But it clearly doesn't, because my code worked back then.

So how to build a parsing table that can accept those empty rules?

1
Why do you think that an empty rule is any harder to handle than a rule with one right-hand-side token?Ira Baxter
My intuition was that an empty rule means "can reduce at anytime", but your question kind of leads me to think this understanding is wrong. I couldn't yet wrap my head around how we build parsing tables. I thought it was similar to building a lexing table except you didn't accept conflicts.Olivier Melançon
Oversimplifying, a grammar rule L = R1 R2 R3 ; means "reduce to L if you see R1 R2 R3". Actually, that's not right. If we have X= A L B ; then our L rule means "reduce to L if your left context is A, you have seen R1 R2 R3, and the next token is first(B). Works the same if L = R1 R2 ; and L = R1 ;. And the limiting case: L = ; (empty rule). You can't reduce to L unless you have seen its left context, its content, and the beginning of what follows. So you can't reduce to an empty rule at "any time". ...Ira Baxter
... What you need to do is learn how LR parsers work, by learning how to track item sets in while bulding parse states. Do this on paper, once, (painful yes, worth it yes) for a small grammar, and LR parsers will all become clear. You can find this process described in any book on LR parsing including the classic Compilers by Aho et al.Ira Baxter

1 Answers

1
votes

Why do you think that an empty rule is any harder to handle than a rule with one right-hand-side token?

Oversimplifying, a grammar rule L = R1 R2 R3 ; means "reduce to L if you see R1 R2 R3". Unsimplifying, if we have X= A L B ; then our L rule means "reduce to L if your left context is A, you have seen R1 R2 R3, and the next token is first(B).

This idea is the same if L = R1 R2 ; and L = R1 ;.

And even for the limiting case of the (empty rule): L = ;

You can't reduce to L unless you have seen its left context, its content, and the beginning of what follows. So you can't reduce to an empty rule at "any time".

What you need to do is learn how LR parsers work, by learning how to track item sets in while bulding parse states. Do this on paper, once, (painful yes, worth it yes) for a small grammar, and LR parsers will all become clear. You can find this process described in any book on LR parsing including the classic Compilers by Aho et al.