LALR(1) shift/reduce error using % for both percent and mod

Question

I started a project with a grammar that used % (and the word mod) for modulus operators, and now I would like to add % as a trailing unary operator to divide by 100.

A few notes, I don't work with a C-based language, I have implemented my own tokenizer/compiler using the XML output from bison. And my choice of steps are critial for my implementation.

Is there a way I can make my grammar to compile without any shift/reduce errors in a LALR(1) compiler?

Basically the following statements are all valid:

5% -> 0.05
5%%5 -> 0.05 mod 5
5%%%5 -> 0.0005 mod 5 etc.

I just don't know how to formulate this into my grammar:

%token S_NUM

%%

default: mod_term ;

mod_term: _mod_value
    | percent_term ;

_mod_value: mod_term O_PERCENT percent_term ;

percent_term: _percent_value
    | value ;

_percent_value: value O_PERCENT ;

value: S_NUM ;

%%

I also compile it using the following statement: bison -v --report=all --warnings=no-other -Werror=conflicts-sr --xml test.y -o test.y.xml

(Where I force shift/reduce as errors because of my environment)

Any ideas? I've played around with %left and %right specifiers, but no luck.

You probably need to add some more examples. If the input was 6 % 5, it would be 6 modulus 5. If the input was 8%% % 5, it would be 8 times 0.01 times 0.01 modulus 5. I'm not sure there is a way to make that unambiguous. When you get to the second % in 8%% % 5, you have to look two tokens ahead to determine that there is a number after the next percent — which means your grammar is not LALR(1). Looking at the first percent, you have to look three tokens ahead to know what you're dealing with. I think you need to redesign your language. — Jonathan Leffler
@JonathanLeffler thanks for the response, I did add the examples you mentioned in my original post. And, yes, you are correct about LALR(1) not being sufficient. My two options are to deprecate the % as modulus and use only mod, or I can redesign the language and implement the parser again to compensate. I was just hoping there was something easier... low hanging fruit. For now I am using mod as modulus and % as percent, and hoping there was no implmentation using this. — Marius
Please post comments after you've edited the question, not before you've edited the question. It's annoying to see "I did add the examples" in a comment only to see that the question has not yet been edited. Basic courtesy, please! — Jonathan Leffler
Yes, I did mean add additional examples. I did see the three in the question — they come across to me as mostly ambiguous. If there's no extra information that can be usefully added, then I stand by my "I don't think the existing design is good; it is inherently ambiguous and not LALR(1)". I'm not sure of the benefit from using a modulus operation on a fractional value by an integer quantity — that's just the fractional value again, assuming a floating-point modulus operation at all. But that's your problem. And I don't think you can make it simply clear how this postscript % works. — Jonathan Leffler

Chris Dodd Chris Dodd · Accepted Answer · 2020-05-05T17:04:44

The ambiguity you have here is between '%' being a postfix operator and an infix operator. This is very similar to the common expression parser issue with '-' being both a prefix and an infix operator, and you can solve it the same way with an explicit %prec directive. The traditional way to write this would be:

%left '%'            /* left-associative infix operator */
%nonassoc POSTFIX    /* postfix operations are higher precedence */
%token VAL
%%

expr: expr '%' expr
    | expr '%' %prec POSTFIX
    | VAL
    ;

using precedence to solve both the associative ambiguity of infix-% and the precedence ambiguity between infix and postfix.

To solve it without the precedence rules, you need something like:

%token S_NUM O_PERCENT
%%

default: mod_term ;

mod_term: _mod_value
    | _mod_value O_PERCENT mod_term ;

_mod_value: _mod_value O_PERCENT ;
    | S_NUM
    ;

which makes the infix-% right associative instead of left associative. Unfortunately I see no way of solving this without using precedence rules that also makes infix-% left associative. This is due to the fact that you can't decide whether a given '%' token is infix or postfix until you see the token after it, so the non-terminal before the '%' needs to be the same for both rules (_mod_value here or expr in the %prec code)

LALR(1) shift/reduce error using % for both percent and mod

2 Answers