Left-recursive Grammar Identification

Question

Often we would like to refactor a context-free grammar to remove left-recursion. There are numerous algorithms to implement such a transformation; for example here or here.

Such algorithms will restructure a grammar regardless of the presence of left-recursion. This has negative side-effects, such as producing different parse trees from the original grammar, possibly with different associativity. Ideally a grammar would only be transformed if it was absolutely necessary.

Is there an algorithm or tool to identify the presence of left recursion within a grammar? Ideally this might also classify subsets of production rules which contain left recursion.

rici rici · Accepted Answer · 2013-10-22T23:28:37

There is a standard algorithm for identifying nullable non-terminals, which runs in time linear in the size of the grammar (see below). Once you've done that, you can construct the relation A potentially-starts-with B over all non-terminals A, B. (In fact, it's more normal to construct that relationship over all grammatical symbols, since it is also used to construct FIRST sets, but in this case we only need the projection onto non-terminals.)

Having done that, left-recursive non-terminals are all A such that A potentially-starts-with⁺ A, where potentially-starts-with⁺ is:

potentially-starts-with ∘ potentially-starts-with^*

You can use any transitive closure algorithm to compute that relation.

For reference, to detect nullable non-terminals.

Remove all useless symbols.
Attach a pointer to every production, initially at the first position.
Put all the productions into a workqueue.
While possible, find a production to which one of the following applies:
- If the left-hand-side of the production has been marked as an ε-non-terminal, discard the production.
- If the token immediately to the right of the pointer is a terminal, discard the production.
- If there is no token immediately to the right of the pointer (i.e., the pointer is at the end) mark the left-hand-side of the production as an ε-non-terminal and discard the production.
- If the token immediately to the right of the pointer is a non-terminal which has been marked as an ε-non-terminal, advance the pointer one token to the right and return the production to the workqueue.

Once it is no longer possible to select a production from the work queue, all ε-non-terminals have been identified.

Just for fun, a trivial modification of the above algorithm can be used to do step 1. I'll leave it as an exercise (it's also an exercise in the dragon book). Also left as an exercise is the way to make sure the above algorithm executes in linear time.

Left-recursive Grammar Identification

1 Answers