1
votes

I'm building a simple language parser, and having an issue with lower precedence prefix expressions. Here's an example grammar:

E = E5
E5 = E4 'OR' E4 | E4
E4 = E3 'AND' E3 | E3
E3 = 'NOT' E3 | E2
E2 = E1 '==' E1 | E1
E1 = '(' E ')' | 'true' | 'false'

However, this grammar doesn't work correctly for the NOT, if it's used as the RHS of a higher precedence infix operator, i.e.:

true == NOT false

This is due to the == operator requiring E1 on the RHS, which cannot be a NOT operation.

I'm unsure the correct way to express this grammar? Is it still possible using this simplistic recursive descent approach, or will I need to move to a more featured algorithm (shunting yard or precedence climbing).

3
Note that true == (NOT false) does parse, due to the explicit parenthesis rule that restarts the evaluation from the top level. - Chris Leishman
Maybe use polish notation? E = E5 E5 = 'OR' E4 E4 | E4 E4 = 'AND' E3 E3 | E3 E3 = 'NOT' E3 | E2 E2 = '==' E1 E1 | E1 E1 = '(' E ')' | 'true' | 'false' - user2746020
If I understand correctly, you want not E == E to parse as though it were not (E == E), but E == not E to parse as though it were E == (not E). That's possible but weird. If that's actually what you want, please make it clearer in the question. - rici
It's definitely weird, but that's the only logically valid way to parse E == not E. - Chris Leishman
Note - I've clarified and extended in stackoverflow.com/questions/24337000 - Chris Leishman

3 Answers

1
votes

Assuming the following input and expected parses are correct:

  1. test 1
    • input: true == NOT false
    • output: (true == (NOT false))
  2. test 2
    • input: NOT true == false
    • output: (NOT (true == false))
  3. test 3
    • input: NOT true == NOT false
    • output: (NOT (true == (NOT false)))

Here's an (ANTLR4) grammar that does the trick:

grammar Expr;

e : e5;
e5 : e4 'OR' e5 | e4;
e4 : e3 'AND' e4 | e3;
e3 : 'NOT' e3 | e2;
e2 : e1 '==' e3 | e1;
e1 : '(' e ')' | 'true' | 'false';

S : [ \t\r\n] -> skip;

Parses ANTLR created:

1

enter image description here

2

enter image description here

3

enter image description here

0
votes

Your language is also (unnecessarily) ambiguous. Fixing that helps you fix this problem, too.

Here, D is shorthand for "disjunction", C for conjunction, N for negation, and P for primary, E for equality.

D = C | C 'OR'  D
C = N | N 'AND' C
N = E |   'NOT' N
E = P | P '=='  P
P = '(' E ')' | 'true' | 'false'
-2
votes

Maybe use polish notation?

E = E5
E5 = 'OR' E4 E4 | E4
E4 = 'AND' E3 E3 | E3
E3 = 'NOT' E3 | E2
E2 = '==' E1 E1 | E1
E1 = '(' E ')' | 'true' | 'false'