15
votes

I'm trying to learn how to make a compiler. In order to do so, I read a lot about context-free language. But there are some things I cannot get by myself yet.

Since it's my first compiler there are some practices that I'm not aware of. My questions are asked with in mind to build a parser generator, not a compiler neither a lexer. Some questions may be obvious..

Among my reads are : Bottom-Up Parsing, Top-Down Parsing, Formal Grammars. The picture shown comes from : Miscellanous Parsing. All coming from the Stanford CS143 class.

Parsers / Grammars Hierarchy

Here are the points :

0) How do ( ambiguous / unambiguous ) and ( left-recursive / right-recursive ) influence the needs for one algorithm or another ? Are there other ways to qualify a grammar ?

1) An ambiguous grammar is one that have several parse trees. But shouldn't the choice of a leftmost-derivation or rightmost-derivation lead to unicity of the parse tree ?

[EDIT: Answered here ]

2.1) But still, is the ambiguity of the grammar related to k ? I mean giving a LR(2) grammar, is it ambiguous for a LR(1) parser and not ambiguous for a LR(2) one ?

[EDIT: No it's not, a LR(2) grammar means that the parser will need two tokens of lookahead to choose the right rule to use. On the other hand, an ambiguous grammar is one that possibly leads to several parse trees. ]

2.2) So a LR(*) parser, as long as you can imagine it, will have no ambiguous grammar at all and can then parse the entire set of context free languages ?

[EDIT: Answered by Ira Baxter, LR(*) is less powerful than GLR, in that it can't handle multiple parse trees. ]

3) Depending on the previous answers, what follows may be self contradictory. Considering LR parsing, do ambiguous grammars trigger shift-reduce conflict ? Can an unambiguous grammar trigger one too ? In the same way, what about reduce-reduce conflicts ?

[EDIT: this is it, ambiguous grammars leads to shift-reduce and reduce-reduce conflicts. By contrapositive, if there are no conflicts, the grammar is univocal. ]

4) The ability to parse left-recursive grammar is an advantage of LR(k) parser over LL(k), is it the only difference between them ?

[EDIT: yes. ]

5) Giving G1 :

G1 :
S -> S + S
S -> S - S
S -> a

5.1) G1 is both left-recursive, right-recursive, and ambiguous, am I right ? Is it a LR(2) grammar ? One would make it unambiguous :

G2 :
S -> S + a
S -> S - a
S -> a

5.2) Is G2 still ambiguous ? Does a parser for G2 needs two lookaheads ? By factorisation we have :

G3 :
S -> S V
V -> + a
V -> - a
S -> a

5.3) Now, does a parser for G3 need one lookahead only ? What are the counter parts for doing these transformations ? Is LR(1) the minimal parser required ?

5.4) G1 is left recursive, in order to parse it with a LL parser, one need to transform it into a right recursive grammar :

G4 :
S -> a + S
S -> a - S
S -> a

then

G5 :
S -> a V
V -> - V
V -> + V
V -> a

5.5) Does G4 need at least a LL(2) parser ? G5 only is parsable by a LL(1) parser, G1-G5 do define the same language, and this language is ( a (+/- a)^n ). Is it true ?

5.6) For each grammar G1 to G5, what is the minimal set to which it belongs ?

6) Finally, since many differents grammars may define the same language, how does one chose the grammar and the associated parser ? Is the resulting parse tree imortant ? What is the influence of the parse tree ?

I'm asking a lot, and I don't really expect a complete answer, anyway any help would be very appreciated.

Thx for reading !

1

1 Answers

10
votes

"Many grammars may define the same langauge, how does one choose..."?

Usually, you choose the one that meets the following criteria:

  • conceptually as simple as you can make it (implication: smaller than others)
  • tracks the terminology in the langauge reference manual where possible
  • least amount of bending to meet the constraints of your parser generator

That last one can make a mess of your conceptual simplicity, and your chart of various parser styles shows the number of different issues that you face depending on your choice-of-generator. This is aggravated by the fact that choice is often made well before you actually choose the grammar.

One way to minimize grammar bending is to choose a parser generator which handles fully context-free grammars. GLR parsing has this very significant advantage. I've been using it for 15 years and have done dozens of real langauges with it.