0
votes

Suppose I have a grammar:

S -> Aaa
A -> a | ε

Clearly, this grammar can parse only sequences aa and aaa. A simple LR(1) parser (or even LL) can parse these when transformed into an equivalent grammar:

S -> aaA
A -> a | ε

Although these grammars are equivalent, their generated parse trees are different. Consider, for the sequence aaa:

    S          S
   / \        / \
  A  aa      aa  A
  |              |
  a              a

Grammars determine whether a sequence is part of a language, instead of providing the parse tree that represents it in the language. The un-transformed grammar cannot parse the sequence (without greater look-ahead); While the transformed grammar can parse it, but builds an invalid parse tree.

How would one go about building a parse tree for a sequence - whose (context-free) grammar can (untransformed) not be represented by an LR-parser?

2
This questions is quite theoretical and as such I believe it belongs to Computer Science SO site.sophros

2 Answers

0
votes

If a grammar has no LR(1) parser, you need to use a different parsing algorithm. In this case, you could use an LR(3) parser. Or you could (in general) use an Earley or GLR parser, which have no lookahead limitations.

I think your question has to do with recovering the original parse from the results of a parse with a transformed grammar. This will depend on the transformation algorithm.

In the example you provide, I think you're using the left-recursion-elimination transformation; this procedure does not preserve derivations, so as far as I know there is no algorithm for recovering the original parse.

There is a different transformation which can be used to construct an LR(1) grammar from.an LR(k) grammar if the value of k is known. That transformation is reversible. However, it's not usually considered practical because it effectively encodes the LR(k) machine into the grammar rules, leading to a massive blow-up of the grammar. It would be equivalent to use a real LR(k) parser, which also has a huge automaton.

0
votes

First, I would say, "Grammars determine whether a sequence is a sentence of a language. You then say that the transformed grammar builds an invalid parse tree. I would say that it builds a different parse tree, which may or may not be useful. But, to your question about building a parse tree from a non-LR grammar. Consider the following grammar that is not LR(k) for any k because it is ambiguous:

E -> E + E | E * E | number

For example:

7 * 4 + 3

There are two distinct parse trees you can build from this sentence precisely due to the ambiguity in the grammar (really this is the definition of an ambiguous grammar). So, the answer to your question is that I wouldn't know how to in the general case.