7
votes

Description:

While reading Compiler Design in C book I came across the following rules to describe a context-free grammar:

a grammar that recognizes a list of one or more statements, each of which is an arithmetic expression followed by a semicolon. Statements are made up of a series of semicolon-delimited expressions, each comprising a series of numbers separated either by asterisks (for multiplication) or plus signs (for addition).

And here is the grammar:

1. statements ::= expression;
2.                | expression; statements
3. expression ::= expression + term
4.                | term
5. term       ::= term * factor
6.                | factor
7. factor     ::= number
8.                | (expression)

The book states that this recursive grammar has a major problem. The right hand side of several productions appear on the left-hand side as in production 3 (And this property is called left recursion) and certain parsers such as recursive-descent parser can't handle left-recursion productions. They just loop forever.

You can understand the problem by considering how the parser decides to apply a particular production when it is replacing a non-terminal that has more than one right hand side. The simple case is evident in Productions 7 and 8. The parser can choose which production to apply when it's expanding a factor by looking at the next input symbol. If this symbol is a number, then the compiler applies Production 7 and replaces the factor with a number. If the next input symbol was an open parenthesis, the parser would use Production 8. The choice between Productions 5 and 6 cannot be solved in this way, however. In the case of Production 6, the right-hand side of term starts with a factor which, in tum, starts with either a number or left parenthesis. Consequently, the parser would like to apply Production 6 when a term is being replaced and the next input symbol is a number or left parenthesis. Production 5-the other right-hand side-starts with a term, which can start with a factor, which can start with a number or left parenthesis, and these are the same symbols that were used to choose Production 6.


Question:

That second quote from the book got me completely lost. So by using an example of some statements as (for example) 5 + (7*4) + 14:

  1. What's the difference between factor and term? using the same example
  2. Why can't a recursive-descent parser handle left-recursion productions? (Explain second quote).
2
On the other side, I would like to suggest you to keep on querying The Dragon Book as a reference for theoretical topics of compiler. That is a must read book for every compiler-enthusiasts.Am_I_Helpful

2 Answers

5
votes
  1. What's the difference between factor and term? using the same example

I am not giving the same example as it won't give you clear picture of what you have doubt about!

Given,

term       ::= term * factor | factor
factor     ::= number | (expression)

Now,suppose if I ask you to find the factors and terms in the expression 2*3*4. Now,multiplication being left associative, will be evaluated as :-

(2*3)*4

As you can see, here (2*3) is the term and factor is 4(a number). Similarly you can extend this approach upto any level to draw the idea about term.

As per given grammar, if there's a multiplication chain in the given expression, then its sub-part,leaving a single factor, is a term ,which in turn yields another sub-part---the another term, leaving another single factor and so on. This is how expressions are evaluated.

  1. Why can't a recursive-descent parser handle left-recursion productions? (Explain second quote).

Your second statement is quite clear in its essence. A recursive descent parser is a kind of top-down parser built from a set of mutually recursive procedures (or a non-recursive equivalent) where each such procedure usually implements one of the productions of the grammar.

It is said so because it's clear that recursive descent parser will go into infinite loop if the non-terminal keeps on expanding into itself.

Similarly, talking about a recursive descent parser,even with backtracking---When we try to expand a non-terminal, we may eventually find ourselves again trying to expand the same non-terminal without having consumed any input.

A-> Ab

Here,while expanding the non-terminal A can be kept on expanding into

A-> AAb -> AAAb -> ... -> infinite loop of A.

Hence, we prevent left-recursive productions while working with recursive-descent parsers.

3
votes
  1. The rule factor matches the string "1*3", the rule term does not (though it would match "(1*3)". In essence each rule represents one level of precedence. expression contains the operators with the lowest precedence, factor the second lowest and term the highest. If you're in term and you want to use an operator with lower precedence, you need to add parentheses.

  2. If you implement a recursive descent parser using recursive functions, a rule like a ::= b "*" c | d might be implemented like this:

    // Takes the entire input string and the index at which we currently are
    // Returns the index after the rule was matched or throws an exception
    // if the rule failed
    parse_a(input, index) {
      try {
        after_b = parse_b(input, index)
        after_star = parse_string("*", input, after_b)
        after_c = parse_c(input, after_star)
        return after_c
      } catch(ParseFailure) {
        // If one of the rules b, "*" or c did not match, try d instead
        return parse_d(input, index)
      }
    }
    

    Something like this would work fine (in practice you might not actually want to use recursive functions, but the approach you'd use instead would still behave similarly). Now, let's consider the left-recursive rule a ::= a "*" b | c instead:

    parse_a(input, index) {
      try {
        after_a = parse_a(input, index)
        after_star = parse_string("*", input, after_a)
        after_b = parse_c(input, after_star)
        return after_b
      } catch(ParseFailure) {
        // If one of the rules a, "*" or b did not match, try c instead
        return parse_c(input, index)
      }
    }
    

Now the first thing that the function parse_a does is to call itself again at the same index. This recursive call will again call itself. And this will continue ad infinitum, or rather until the stack overflows and the whole program comes crashing down. If we use a more efficient approach instead of recursive functions, we'll actually get an infinite loop rather than a stack overflow. Either way we don't get the result we want.