parsing a file with specific format in ply (python)

Question

i have a problem with ply, i have to receive a file with a tokens list and a grammar (bnf), i wrote a grammar to recognize the input, and it is almost working (just minor issues, we are solving them), for example this is a valid input file

#tokens = NUM PLUS TIMES
exp : exp PLUS exp | exp TIMES exp
exp : NUM

(we dont care, in this case, about ambiguous grammar or whatever, this is an example for input)

parsing every line separately works fine, but i want to parse the whole file with these rules:

#tokens must be only in first line, so if we have a #tokens declaration after grammar it is not valid
you can have 0 or more blank lines after every line of "code"
you can have as many grammar rules as you want

i tried using a loop to scan and parse every line separately, but i can't control the rirst (and really important) rule, so i tried this in my .py file:

i defined t_NLINEA (new line) i had also problem using the \n character as a literal and the file is open using rU mode to avoid conflicts about \r\n or \n characters, so i added these rules:

def p_S(p):
'''S : T N U'''
print("OK")

def p_N(p): '''N : NLINEA N''' pass

def p_N2(p): '''N : ''' pass

def p_U(p): '''U : R N U''' pass

def p_U2(p): '''U : ''' pass

(as i told you above, i had tu use the N rule because ply didnt accept the \n literal in my grammar, i added the \n to "literals" variable)

T is the rule to parse the #tokens declaration and R is used to parse grammar rules, T and R works ok if i use them in a single line string, but when i add the productions i wrote above i get a syntax error when parsing the fisrt gramar rule, for example A : B C i get syntax error with :

any suggestion? thanks

swstephe swstephe · Accepted Answer · 2013-11-06T19:59:35

Ply tries to figure out a "starting rule" based on your rules. With what you have written, it will make "exp" the start rule, which says there is only one expression per string or file. If you want multiple expressions, you probably want a list of expressions:

def p_exp_list(p):
    """exp_list :
                | exp_list exp
    """
    if len(p) == 1:
       p[0] = []
    else:
       p[0] = p[1] + [p[2]]

Then your starting rule will be "exp_list". This would allow multiple expressions on each line. If you want to limit to one expression per line, then how about:

def p_line_list(p):
    """line_list :
                 | line_list line
    """
    if len(p) == 1:
       p[0] == []
    else:
       p[0] = p[1] + [p[2]]

def p_line(p):
    """line : exp NL"""
    p[0] = p[1]

I don't think you can use newline as a literal, (because it might mess up regular expressions). You probably need a more specific token rule:

t_NL = r'[\r*\n]'

Pretty sure this would work, but haven't tried it as there isn't enough to go on.

As for the "#token" line, you could just skip it, if it doesn't appear anywhere else:

def t_COMMENT(t):
    r'#.*$'
    pass # ignore this token

parsing a file with specific format in ply (python)

1 Answers