Suppose I have a grammar:
S -> Aaa
A -> a | ε
Clearly, this grammar can parse only sequences aa
and aaa
. A simple LR(1) parser (or even LL) can parse these when transformed into an equivalent grammar:
S -> aaA
A -> a | ε
Although these grammars are equivalent, their generated parse trees are different. Consider, for the sequence aaa
:
S S
/ \ / \
A aa aa A
| |
a a
Grammars determine whether a sequence is part of a language, instead of providing the parse tree that represents it in the language. The un-transformed grammar cannot parse the sequence (without greater look-ahead); While the transformed grammar can parse it, but builds an invalid parse tree.
How would one go about building a parse tree for a sequence - whose (context-free) grammar can (untransformed) not be represented by an LR-parser?