1
votes

I'm trying to figure out how to remove the indirect left recursion from the logical keyword expressions within my Rust port of a Ruby parser (https://github.com/kenaniah/ruby-parser/blob/master/ruby-parser/src/parsers/expression/logical.rs). The grammar looks like:

E --> N | A | O | t
N --> n E
A --> E a E
O --> E o E
E = expression
A = keyword_and_expression
O = keyword_or_expression
N = keyword_not_expression

How would I go about transforming this to remove the recursion in A and O?

2
As the source code shows I have already removed the left recursion from those (those were cases of direct left recursion, which I know how to handle). It's the indirect recursion that's tripping me up as I'm not sure how to transform it into an equivalent grammar.Kenaniah

2 Answers

0
votes

According to this factorization tool, the resulting grammar would be:

E  -> N
    | A
    | O
    | t
N  -> n E
A  -> n E a E A'
    | O a E A'
    | t a E A'
O  -> n E o E O'
    | n E a E A' o E O'
    | t a E A' o E O'
    | t o E O'
A' -> a E A'
    | ϵ
O' -> a E A' o E O'
    | o E O'
    | ϵ

Looks like the factorizations for A and O ended up being rather complex thanks to the multiple productions of E.

0
votes

I think the problem here is not the indirect recursion but rather the ambiguity.

If it were just indirect recursion, you could simply substitute the right-hand sides of N, A and O, eliminating the indirect recursion:

E → n E
  | E a E
  | E o E
  | t

In order to get rid of the direct left recursion, you can left-factor:

E → n E
  | E A'
  | t
A'→ a E
  | o E

and then remove the remaining left-recursion:

E → n E E'
  | t E'
E'→ ε
  | A' E'
A'→ a E
  | o E

That has no left-recursion, direct or indirect, and every rule starts with a unique terminal. However, it's not LL(1), because there's a first/follow conflict.

That's really not surprising, because the original grammar was highly ambiguous, and left-recursion elimination doesn't eliminate ambiguity. The original grammar really only makes sense if accompanied by an operator precedence table.

That table indicates that AND and OR are left-associative operators with the same precedence (a slightly unusual decision), while NOT is a unary operator with higher precedence. That, in turn, means that the BNF should have been written something like this:

N → n N
  | t
E → A
  | O
  | N
A → E a N
O → E o N
N → n N
  | t

The only difference from the grammar in the OP is the removal of ambiguity, as indicated by the precedence table.

Again, the first step is to substitute non-terminals A and O in order to make the left-recursion direct:

E → E a N
  | E o N
  | N
N → n N
  | t

This is essentially the same grammar as the grammar for arithmetic expressions (ignoring multiplication, since there's only one precedence level), and the left-recursion can be eliminated directly:

E  → N E'
E' → a E
   | o E
   | ε
N  → n N
   | t