How to parse this simple grammar? Is it ambiguous?

Question

I'm delving deeper into parsing and came across an issue I don't quite understand. I made up the following grammar:

S = R | aSc
R = b | RbR

where S is the start symbol. It is possible to show that abbbc is a valid sentence based on this grammar, hopefully, that is correct but I may have completely missunderstood something. If I try to implement this using recursive descent I seem to have a problem when trying to parse abbbc, using left-derivation eg

S => aSc
aSc => aRc

at this point I would have thought that recursive descent would pick the first option in the second production because the next token is b leading to:

 aRc => abc

and we're finished since there are no more non-terminals, which isn't of course abbbc. The only way to show that abbbc is valid is to pick the second option but with one lookahead I assume it would always pick b. I don't think the grammar is ambiguous unless I missed something. So what I am doing wrong?

Update: I came across this nice derivation app at https://web.stanford.edu/class/archive/cs/cs103/cs103.1156/tools/cfg/. I used to do a sanity check that abbbc is a valid sentence and it is.

Thinking more about this problem, is it true to say that I can't use LL(1) to parse this grammar but in fact need LL(2)? With two lookaheads I could correctly pick the second option in the second production because I now also know there are more tokens to be read and therefore picking b would prematurely terminate the derivation.

templatetypedef templatetypedef · Accepted Answer · 2018-12-13T19:46:58

For starters, I’m glad you’re finding our CFG tool useful! A few of my TAs made that a while back and we’ve gotten a lot of mileage out of it.

Your grammar is indeed ambiguous. This stems from your R nonterminal:

R → b | RbR

Generally speaking, if you have recursive production rules with two copies of the same nonterminal in it, it will lead to ambiguities because there will be multiple options for how to apply the rule twice. For example, in this case, you can derive bbbbb by first expanding R to RbR, then either

expanding the left R to RbR and converting each R to a b, or
expanding the right R to RbR and converting each R to a b.

Because this grammar is ambiguous, it isn’t going to be LL(k) for any choice of k because all LL(k) grammars must be unambiguous. That means that stepping up the power of your parser won’t help here. You’ll need to rewrite the grammar to not be ambiguous.

The nonterminal R that you’ve described here generates strings of odd numbers of b’s in them, so we could try redesigning R to achieve this more directly. An initial try might be something like this:

R → b | bbR

This, unfortunately, isn’t LL(1), since after seeing a single b it’s unclear whether you’re supposed to apply the first production rule or the second. However, it is LL(2).

If you’d like an LL(1) grammar, you could do something like this:

R → bX

X → bbX | ε

This works by laying down a single b, then laying down as many optional pairs of b’s as you’d like.

How to parse this simple grammar? Is it ambiguous?

1 Answers