I'm trying to write a Tiger parser. I initially used PyPEG, but due to some difficulties, I went with Arpeggio.
My grammar is really simple.
def number(): return _(r'[0-9]+')
def string(): return _(r"\".*?\"")
def id(): return _(r'[a-zA-Z][a-zA-Z0-9_]*')
def literal(): return [number, string]
def simple_var(): return id
def let_in_exp(): return 'let', 'in', Optional(ZeroOrMore(exp)), 'end'
param = [number, string]
params = Optional(param, ZeroOrMore(',', param))
def function_call(): return id, '(', params, ')'
exp = [let_in_exp, simple_var, literal, function_call]
def code(): return OneOrMore(exp), EOF
The difficulty resides in the let-in-exp
expression.
let in let in let in end end end
is valid Tiger.
However - currently - Arpeggio doesn't recognize the let-in-exp
as is, but as three simple-var
. Indeed, going into the ZeroOrMore(exp), it consumes the end
, and so doesn't find it for the let-in-exp
.
How can one resolve such problem?
id
. I'm not sure how PyPEG let's you define a negative lookahead, but if you can expand onid
to not match the keywords before matching the regexp, then I think your recursion will work. – PaulMcGNot()
andAnd()
. That solved it. So thanks a lot! Anyway, isn't there a more idiomatic way to write a PEG grammar? – kino