3
votes

Note: this is a more detailed version of Recursive Descent precedence parsing missing prefix expression

I'm building a simple language parser, and having an issue with lower precedence prefix expressions. Here's an example grammar:

E = E8
E8 = E7 'OR' E8 | E7
E7 = E6 'XOR' E7 | E6
E6 = E5 'AND' E6 | E5
E5 = 'NOT' E5 | E4
E4 = E3 '==' E4 | E3 '!=' E4 | E3
E3 = E2 '<' E3 | E2 '>' E3 | E2
E2 = E1 '+' E2 | E1 '-' E2 | E1 '*' E2 | E1 '+' E2 | E1
E1 = '(' E ')' | 'true' | 'false' | '0'..'9'

However, this grammar doesn't work correctly for the NOT, if it's used as the RHS of a higher precedence infix operator, i.e.:

true == NOT false

This is due to the == operator requiring E3 on the RHS, which cannot be a 'NOT' operation.

I'm unsure the correct way to express this grammar? Is it still possible using this simplistic recursive descent approach, or will I need to move to a more featured algorithm (shunting yard or precedence climbing).

Here are some examples that would need to parse correctly:

  • input true == 1 < 2, output ==(true, <(1, 2))
  • input 1 < 2 == true, output ==(<(1, 2), true)
  • input NOT true == false, output NOT(==(true, false))
  • input true == NOT false, output ==(true, NOT(false)) ** doesn't work
  • input true < NOT false, output <(true, NOT(false)) ** doesn't work

I have attempted to alter the levels E4, E3, and E2 to use E5 on the RHS of the infix expression, as suggested in Recursive Descent precedence parsing missing prefix expression (i.e. E3 '==' E5, E3 '<' E5, etc). However this breaks the precedence between these levels, i.e. true == 1 < 2 would be incorrectly parsed as<(==(true, 1), 2)`.

1
Hmm, I don't see a way, except for adding extra alternatives with the NOT. E.g.: E4 = E3 '==' E3 | E3 '!=' E3 | E3 '==' 'NOT' E3 | E3 '!=' 'NOT' E3 | E3 etc.Bart Kiers
That would get crazy, given NOT wouldn't be the only prefix expression (i.e. also -, +, etc)Chris Leishman
Yeah, I agree. Hence the start of my sentence "I don't see a way", and the fact that I didn't post the suggestion as an answer :)Bart Kiers
This is a language you are defining yourself, right? With your outline above, the relational operators, like == bind harder than the logical operators, like AND. That makes something like A AND B == C AND D parse like A AND (B == C) AND D - is that what you want? I think you probably want the relational operators at the top.500 - Internal Server Error
The standard practice is to make prefix unary operators have the second highest precedence (and postfix unaries should have the highest precedence). It doesn't make much sense to define them differently, for this exact reason.The Paramagnetic Croissant

1 Answers

0
votes

When sticking to the way your language is defined, you cannot have

true == NOT false 

as a valid term in your language. Because then

NOT false == true

would be ambigous: the parse tree could be either

    NOT
     | 
    ==
   /  \
false true

or

   ==
  /  \
 NOT true
  |
false

Note that

true == NOT (false)

is a valid term in your language. A probably more intuitive definition of your language would be to put the NOT-level from E5 down to E2. Then

true == NOT false 
NOT false == true

are both valid and NOT binds with false. And the alternative meaning of the second expression would be expressed as

NOT (false == true)

If these options still do not satisfy you, you have to change/extend the tool. E.g. the yacc/bison parser allows to explicitely define operator precedences; see e.g. here