I am learning about recursive decent parsing and come up with some scenarios in which I think the algorithm isn't working. One of them is, considering this simple grammar:
S → E;
E → id | id + id
Then the string id + id;
is valid in the language of this grammar. However if we execute the recursive descent algorithm, it descends from S
to E
then to id
, which is the first matching terminal. Now the input is at the +
and we are back at S
trying to match ;
which then fails; but there is no other rule to choose from at the level of S
.
I think the grammar is not ambiguous as there are only 2 strings in the language id;
and id + id;
, each of which has a unique parse tree. The general problem here is that a non-terminal has productions with same prefixes and potentially makes a choice that matches at a deeper level in the recursion but creates invalid inputs for shallower levels.
I've read about typical problems with recursive descent like left-recursion but haven't found the problem above mentioned anywhere. So is this really a problem or am I missing something?
I found an authoritative answer from the book Parsing Techniques: A Practical Guide p.182-188
which classify the above approach as naive recursive decent and highlights the same problem. There are two solutions that always work for general case with no look-ahead (since in general the required look-ahead length increases with the prefix length): Exhaustive Recursive Descent which requires the use of continuations and Breadth-First Recursive Descent.