2
votes

I'm implementing an OCaml-like language using rust-peg and my parser has a bug. I defined if-statement grammar, but it doesn't work.

I'm guessing the test-case input is parsed as Apply(Apply(Apply(Apply(f, then), 2) else), 4). I mean "then" is parsed as Ident, not keyword.

I have no idea for fixing this apply-expression grammar. Do you have any ideas?

#[derive(Clone, PartialEq, Eq, Debug)]
pub enum Expression {
    Number(i64),
    If {
        cond: Box<Expression>,
        conseq: Box<Expression>,
        alt: Box<Expression>,
    },
    Ident(String),
    Apply(Box<Expression>, Box<Expression>),
}

use peg::parser;
use toplevel::expression;
use Expression::*;

parser! {
pub grammar toplevel() for str {

    rule _() = [' ' | '\n']*

    pub rule expression() -> Expression
        = expr()

    rule expr() -> Expression
        = if_expr()
        / apply_expr()

    rule if_expr() -> Expression
        = "if" _ cond:expr() _ "then" _ conseq:expr() _ "else" _ alt:expr() {
            Expression::If {
                cond: Box::new(cond),
                conseq: Box::new(conseq),
                alt: Box::new(alt)
            }
        }

    rule apply_expr() -> Expression
        = e1:atom() _ e2:atom() { Apply(Box::new(e1), Box::new(e2)) }
        / atom()

    rule atom() -> Expression
        = number()
        / id:ident() { Ident(id) }

    rule number() -> Expression
        = n:$(['0'..='9']+) { Expression::Number(n.parse().unwrap()) }

    rule ident() -> String
        = id:$(['a'..='z' | 'A'..='Z']['a'..='z' | 'A'..='Z' | '0'..='9']*) { id.to_string() }
}}

fn main() {
    assert_eq!(expression("1"), Ok(Number(1)));
    assert_eq!(
        expression("myFunc 10"),
        Ok(Apply(
            Box::new(Ident("myFunc".to_string())),
            Box::new(Number(10))
        ))
    );

    // failed
    assert_eq!(
        expression("if f then 2 else 3"),
        Ok(If {
            cond: Box::new(Ident("f".to_string())),
            conseq: Box::new(Number(2)),
            alt: Box::new(Number(3))
        })
    );
}
thread 'main' panicked at 'assertion failed: `(left == right)`
  left: `Err(ParseError { location: LineCol { line: 1, column: 11, offset: 10 }, expected: ExpectedSet { expected: {"\"then\"", "\' \' | \'\\n\'"} } })`,
 right: `Ok(If { cond: Ident("f"), conseq: Number(2), alt: Number(3) })`', src/main.rs:64:5
1
I'm not familiar with the peg library, but if it has a way to add predicates to rules, you could create a set of keywords and then add a predicate to the ident rule that requires the identifier to not be in the set of keywords. You'll also want a rule for keywords to make sure that something like ifFunction thenArg elseArg is parsed as a function call rather than as if Function then Arg else Arg.sepp2k

1 Answers

2
votes

PEG uses ordered choice. This means that when you write R = A / B for some rule R, if in a position A succesfully parses, it will never try B, even if the choice of A leads to problems later on. This is the core difference with context-free grammars, and often overlooked.

In particular, when you write apply = atom atom / atom, if it's possible to parse two atoms in a row, it will never try to parse just a single one, even if it means the rest doesn't make sense later.

Combine this with the fact that then and else are perfectly good identifiers in your grammar, you get the issue that you see.