How can LR parsers generate parse trees?

Question

Suppose I have a grammar:

S -> Aaa
A -> a | ε

Clearly, this grammar can parse only sequences aa and aaa. A simple LR(1) parser (or even LL) can parse these when transformed into an equivalent grammar:

S -> aaA
A -> a | ε

Although these grammars are equivalent, their generated parse trees are different. Consider, for the sequence aaa:

    S          S
   / \        / \
  A  aa      aa  A
  |              |
  a              a

Grammars determine whether a sequence is part of a language, instead of providing the parse tree that represents it in the language. The un-transformed grammar cannot parse the sequence (without greater look-ahead); While the transformed grammar can parse it, but builds an invalid parse tree.

How would one go about building a parse tree for a sequence - whose (context-free) grammar can (untransformed) not be represented by an LR-parser?

This questions is quite theoretical and as such I believe it belongs to Computer Science SO site. — sophros

rici rici · Accepted Answer · 2019-07-08T15:58:40

If a grammar has no LR(1) parser, you need to use a different parsing algorithm. In this case, you could use an LR(3) parser. Or you could (in general) use an Earley or GLR parser, which have no lookahead limitations.

I think your question has to do with recovering the original parse from the results of a parse with a transformed grammar. This will depend on the transformation algorithm.

In the example you provide, I think you're using the left-recursion-elimination transformation; this procedure does not preserve derivations, so as far as I know there is no algorithm for recovering the original parse.

There is a different transformation which can be used to construct an LR(1) grammar from.an LR(k) grammar if the value of k is known. That transformation is reversible. However, it's not usually considered practical because it effectively encodes the LR(k) machine into the grammar rules, leading to a massive blow-up of the grammar. It would be equivalent to use a real LR(k) parser, which also has a huge automaton.

How can LR parsers generate parse trees?

2 Answers