1
votes

I'm building a simple language parser, and having an issue with lower precedence prefix expressions. Here's an example grammar:

E = E5
E5 = E4 'OR' E4 | E4
E4 = E3 'AND' E3 | E3
E3 = 'NOT' E3 | E2
E2 = E1 '==' E1 | E1
E1 = '(' E ')' | 'true' | 'false'

However, this grammar doesn't work correctly for the NOT, if it's used as the RHS of a higher precedence infix operator, i.e.:

true == NOT false

This is due to the == operator requiring E1 on the RHS, which cannot be a NOT operation.

I'm unsure the correct way to express this grammar? Is it still possible using this simplistic recursive descent approach, or will I need to move to a more featured algorithm (shunting yard or precedence climbing).

3
Note that true == (NOT false) does parse, due to the explicit parenthesis rule that restarts the evaluation from the top level.Chris Leishman
Maybe use polish notation? E = E5 E5 = 'OR' E4 E4 | E4 E4 = 'AND' E3 E3 | E3 E3 = 'NOT' E3 | E2 E2 = '==' E1 E1 | E1 E1 = '(' E ')' | 'true' | 'false'user2746020
If I understand correctly, you want not E == E to parse as though it were not (E == E), but E == not E to parse as though it were E == (not E). That's possible but weird. If that's actually what you want, please make it clearer in the question.rici
It's definitely weird, but that's the only logically valid way to parse E == not E.Chris Leishman
Note - I've clarified and extended in stackoverflow.com/questions/24337000Chris Leishman

3 Answers

1
votes

Assuming the following input and expected parses are correct:

  1. test 1
    • input: true == NOT false
    • output: (true == (NOT false))
  2. test 2
    • input: NOT true == false
    • output: (NOT (true == false))
  3. test 3
    • input: NOT true == NOT false
    • output: (NOT (true == (NOT false)))

Here's an (ANTLR4) grammar that does the trick:

grammar Expr;

e : e5;
e5 : e4 'OR' e5 | e4;
e4 : e3 'AND' e4 | e3;
e3 : 'NOT' e3 | e2;
e2 : e1 '==' e3 | e1;
e1 : '(' e ')' | 'true' | 'false';

S : [ \t\r\n] -> skip;

Parses ANTLR created:

1

enter image description here

2

enter image description here

3

enter image description here

0
votes

Your language is also (unnecessarily) ambiguous. Fixing that helps you fix this problem, too.

Here, D is shorthand for "disjunction", C for conjunction, N for negation, and P for primary, E for equality.

D = C | C 'OR'  D
C = N | N 'AND' C
N = E |   'NOT' N
E = P | P '=='  P
P = '(' E ')' | 'true' | 'false'
-2
votes

Maybe use polish notation?

E = E5
E5 = 'OR' E4 E4 | E4
E4 = 'AND' E3 E3 | E3
E3 = 'NOT' E3 | E2
E2 = '==' E1 E1 | E1
E1 = '(' E ')' | 'true' | 'false'