I'm trying to write PyParsing code capable of parsing any Python code (I know that the AST module exists, but that will just be a starting point - I ultimately want to parse more than just Python code.)
Anyways, I figure I'll just start by writing something able to parse the classic
print("Hello World!")
So here's what I wrote:
from pyparsing import (alphanums, alphas, delimitedList, Forward,
quotedString, removeQuotes, Suppress, Word)
expr = Forward()
string = quotedString.setParseAction(removeQuotes)
call = expr + Suppress('(') + Optional(delimitedList(expr)) + Suppress(')')
name = World(alphas + '_', alphanums + '_')
expr <<= string | name | call
test = 'print("Hello World!")'
print(expr.parseString(test))
When I do that, though, it just spits out:
['print']
Which is technically a valid expr
- you can type that into the REPL and there's no problem parsing it, even if it's useless.
So I thought maybe what I would want is to flip around name
and call
in my expr
definition, so it would prefer returning call
s to name
s, like this:
expr <<= string | call | name
Now I get a maximum recursion depth exceeded error. That makes sense, too:
- Checks if it's an
expr
.- Checks if it's a
string
, it's not. - Checks if it's a
call
.- It must start with an
expr
, return to start of outer list.
- It must start with an
- Checks if it's a
So my question is... how can I define call
and expr
so that I don't end up with an infinite recursion, but also so that it doesn't just stop when it sees the name and ignore the arguments?
Is Python code too complicated for PyParsing to handle? If not, is there any limit to what PyParsing can handle?
(Note - I've included the general tags parsing, abstract-syntax-tree, and bnf, because I suspect this is a general recursive grammar definition problem, not something necessarily specific to pyparsing.)