Description:
While reading Compiler Design in C book I came across the following rules to describe a context-free grammar:
a grammar that recognizes a list of one or more statements, each of which is an arithmetic expression followed by a semicolon. Statements are made up of a series of semicolon-delimited expressions, each comprising a series of numbers separated either by asterisks (for multiplication) or plus signs (for addition).
And here is the grammar:
1. statements ::= expression;
2. | expression; statements
3. expression ::= expression + term
4. | term
5. term ::= term * factor
6. | factor
7. factor ::= number
8. | (expression)
The book states that this recursive grammar has a major problem. The right hand side of several productions appear on the left-hand side as in production 3 (And this property is called left recursion) and certain parsers such as recursive-descent parser can't handle left-recursion productions. They just loop forever.
You can understand the problem by considering how the parser decides to apply a particular production when it is replacing a non-terminal that has more than one right hand side. The simple case is evident in Productions 7 and 8. The parser can choose which production to apply when it's expanding a factor by looking at the next input symbol. If this symbol is a number, then the compiler applies Production 7 and replaces the factor with a number. If the next input symbol was an open parenthesis, the parser would use Production 8. The choice between Productions 5 and 6 cannot be solved in this way, however. In the case of Production 6, the right-hand side of term starts with a factor which, in tum, starts with either a number or left parenthesis. Consequently, the parser would like to apply Production 6 when a term is being replaced and the next input symbol is a number or left parenthesis. Production 5-the other right-hand side-starts with a term, which can start with a factor, which can start with a number or left parenthesis, and these are the same symbols that were used to choose Production 6.
Question:
That second quote from the book got me completely lost. So by using an example of some statements as (for example) 5 + (7*4) + 14
:
- What's the difference between factor and term? using the same example
- Why can't a recursive-descent parser handle left-recursion productions? (Explain second quote).