1
votes

I've been writing a LALR parser using ply and have come across an inconsistency when trying to parse multiplication.

As the full parser link is several thousand lines long I won't include it here, but I've created a simple demonstration:

import ply.lex as lex
import ply.yacc as yacc

tokens = (
    'int',
    'times',
    'plus',
)

precedence = (
    ('left', 'plus'),
    ('left', 'times'),
)

t_ignore = ' \t\n '
t_int = r' \d+ '
t_plus = r' \+ '
t_times = ' \* '

def p_int(args):
    'expr : int'
    args[0] = int(args[1])

def p_times(args):
    '''expr : expr times expr
            | expr expr %prec times'''
    if len(args) == 3:
        args[0] = args[1] * args[2]
    elif len(args) == 4:
        args[0] = args[1] * args[3]

def p_plus(args):
    'expr : expr plus expr'
    args[0] = args[1] + args[3]

lex.lex()
parser = yacc.yacc()

while True:
    s = raw_input('>> ')
    print " = ", parser.parse(s)

There are no shift/reduce conflicts or reduce/reduce conflicts reported by PLY yet I get the following inconsistency:

    >>  1 + 2 3
     =  9
    >>  1 + 2 * 3
     =  7

This seems odd to me since the explicit and implicit times rules have the same precedence. But I think it could be due to the fact that PLY assigns a precedence to the 'times' token and thus shifts it onto the stack in favour of reducing the expression with the p_plus rule. How can I fix this?

Edit: Simpler demonstration.

2
can you just add open to your precedence association? I havent done grammars in a while - Joran Beasley
That might work in this case, but there are other cases to consider. For example '1 + 2 3' => 9 versus '1 + 2 * 3' => 7. - sn6uv
@JoranBeasley I've edited the question to make the example simpler. - sn6uv
can you add expr to your precedences? ... not sure that might break other stuff - Joran Beasley
Yeah, that doesn't work either. I think because expr is a non-terminal and PLY only lets you assign precedence to terminals. Edit: Something similar does work - adding int to precedence since that's the token that is shifted. - sn6uv

2 Answers

1
votes

A quick hack: add the int token to the precedence specification (with precedence of times). The int token will then be shifted onto the symbol stack appropriately. That is (per the original question),

precedence = (
    ('left', 'plus'),
    ('left', 'times', 'int'),
)

This works but is messy when dealing with a potentially large number of tokens (open brackets, symbols, floats, etc.).

>>  1 + 2 3
 =  7

I'd still like to know if there's a more elegant solution to this.

0
votes

There is a more appropriate solution: ditch operator-precedence and define non-terminal symbols expr, term, and factor in the usual way. The reason? Implied multiplication doesn't have a characteristic token upon which to base precedence decisions, and you don't want to manually recover the FIRST set of a non-terminal. Besides, as your grammar grows, that FIRST set will grow with it.

In BNF, consider something akin to:

expr -> term | expr + term | expr - term
term -> factor | term * factor | term factor | term / factor
factor -> number | variable | ( expr )

And oh-by-the-way, you probably want to make an additional nonterminal layer to represent the fact that implicit multiplication-by-adjacency is normally considered to have higher precedence than division, while explicit multiplication typically is interpreted to have the same precedence.

This is not specific to any particular parser generator framework.