3
votes

I'm trying to write a Tiger parser. I initially used PyPEG, but due to some difficulties, I went with Arpeggio.

My grammar is really simple.

def number(): return _(r'[0-9]+')
def string(): return _(r"\".*?\"")
def id(): return _(r'[a-zA-Z][a-zA-Z0-9_]*')

def literal(): return [number, string]

def simple_var(): return id

def let_in_exp(): return 'let', 'in', Optional(ZeroOrMore(exp)), 'end'

param = [number, string]
params = Optional(param, ZeroOrMore(',', param))

def function_call(): return id, '(', params, ')'

exp = [let_in_exp, simple_var, literal, function_call]

def code(): return OneOrMore(exp), EOF

The difficulty resides in the let-in-exp expression. let in let in let in end end end is valid Tiger.

However - currently - Arpeggio doesn't recognize the let-in-exp as is, but as three simple-var. Indeed, going into the ZeroOrMore(exp), it consumes the end, and so doesn't find it for the let-in-exp.

How can one resolve such problem?

2
I think the problem is that 'let', 'in', and 'end' also match as an id. I'm not sure how PyPEG let's you define a negative lookahead, but if you can expand on id to not match the keywords before matching the regexp, then I think your recursion will work.PaulMcG
I searched in other projects for lookaheads, and finally found Not() and And(). That solved it. So thanks a lot! Anyway, isn't there a more idiomatic way to write a PEG grammar?kino

2 Answers

3
votes

Not an Arpeggio solution, but perhaps more idiomatic to your taste?

from pyparsing import (CaselessKeyword,Word,nums,QuotedString,alphas,alphanums,
    Forward,Group,Optional,OneOrMore,ZeroOrMore,delimitedList)

LET,IN,END = map(CaselessKeyword, "let in end".split())

number = Word(nums).setName("number")
string = QuotedString('"')
ident = ~(LET | IN | END) + Word(alphas, alphanums+'_')
ident.setName("ident")

literal = number | string

simple_var = ident

exp = Forward().setName("exp")
let_in_exp = Group(LET + IN + ZeroOrMore(exp) + END).setName("let_in_exp")

param = number | string
params = delimitedList(param)
function_call = ident() + '(' + Optional(params) + ')'

exp <<= let_in_exp | simple_var | literal | function_call

code = OneOrMore(exp)

tests = """\
    let in let in let in end end end
    let in let in let in "blah" end end end
    let in let in let in "blah" end 1729 end end
    """
code.runTests(tests)

I designed pyparsing to permit combining of expressions using arithmetic operators:

  • + -> And
  • | -> Match First
  • ^ -> Or (try all, match longest)
  • ~ -> Not
  • & -> Each (same as And, but in any order)
  • * -> multiple (as in expr*3 instead of expr+expr+expr)

I believe these operators and the plain language class names like OneOrMore make these parsers easier to understand, and to maintain over time.

3
votes

As Paul already noted, you should use Not syntactic predicate to avoid matching keywords by simple_var rule. Also, I suggest not to wrap ZeroOrMore parsing expression in Optional as it is already implied.

Solution for Arpeggio is

from arpeggio import Not, OneOrMore, ZeroOrMore, Optional, EOF, ParserPython
from arpeggio import RegExMatch as _

keyword = ['let', 'in', 'end']   
def number(): return _(r'[0-9]+')
def string(): return _(r"\".*?\"")
def id(): return _(r'[a-zA-Z][a-zA-Z0-9_]*')

def literal(): return [number, string]
def simple_var(): return Not(keyword), id
def let_in_exp(): return 'let', 'in', ZeroOrMore(exp), 'end'

param = [number, string]
params = Optional(param, ZeroOrMore(',', param))

def function_call(): return id, '(', params, ')'

exp = [let_in_exp, simple_var, literal, function_call]

def code(): return OneOrMore(exp), EOF

parser = ParserPython(code, debug=True)
test = 'let in 42 let in "foo" let in end end end'
parse_tree = parser.parse(test)

This will yield the following parse tree

enter image description here