0
votes

I started a project with a grammar that used % (and the word mod) for modulus operators, and now I would like to add % as a trailing unary operator to divide by 100.

A few notes, I don't work with a C-based language, I have implemented my own tokenizer/compiler using the XML output from bison. And my choice of steps are critial for my implementation.

Is there a way I can make my grammar to compile without any shift/reduce errors in a LALR(1) compiler?

Basically the following statements are all valid:

  • 5% -> 0.05
  • 5%%5 -> 0.05 mod 5
  • 5%%%5 -> 0.0005 mod 5 etc.

I just don't know how to formulate this into my grammar:

%token S_NUM

%%

default: mod_term ;

mod_term: _mod_value
    | percent_term ;

_mod_value: mod_term O_PERCENT percent_term ;

percent_term: _percent_value
    | value ;

_percent_value: value O_PERCENT ;

value: S_NUM ;

%%

I also compile it using the following statement: bison -v --report=all --warnings=no-other -Werror=conflicts-sr --xml test.y -o test.y.xml

(Where I force shift/reduce as errors because of my environment)

Any ideas? I've played around with %left and %right specifiers, but no luck.

2
What error do you get, on what states? - user207421
You probably need to add some more examples. If the input was 6 % 5, it would be 6 modulus 5. If the input was 8%% % 5, it would be 8 times 0.01 times 0.01 modulus 5. I'm not sure there is a way to make that unambiguous. When you get to the second % in 8%% % 5, you have to look two tokens ahead to determine that there is a number after the next percent — which means your grammar is not LALR(1). Looking at the first percent, you have to look three tokens ahead to know what you're dealing with. I think you need to redesign your language. - Jonathan Leffler
@JonathanLeffler thanks for the response, I did add the examples you mentioned in my original post. And, yes, you are correct about LALR(1) not being sufficient. My two options are to deprecate the % as modulus and use only mod, or I can redesign the language and implement the parser again to compensate. I was just hoping there was something easier... low hanging fruit. For now I am using mod as modulus and % as percent, and hoping there was no implmentation using this. - Marius
Please post comments after you've edited the question, not before you've edited the question. It's annoying to see "I did add the examples" in a comment only to see that the question has not yet been edited. Basic courtesy, please! - Jonathan Leffler
Yes, I did mean add additional examples. I did see the three in the question — they come across to me as mostly ambiguous. If there's no extra information that can be usefully added, then I stand by my "I don't think the existing design is good; it is inherently ambiguous and not LALR(1)". I'm not sure of the benefit from using a modulus operation on a fractional value by an integer quantity — that's just the fractional value again, assuming a floating-point modulus operation at all. But that's your problem. And I don't think you can make it simply clear how this postscript % works. - Jonathan Leffler

2 Answers

1
votes

The ambiguity you have here is between '%' being a postfix operator and an infix operator. This is very similar to the common expression parser issue with '-' being both a prefix and an infix operator, and you can solve it the same way with an explicit %prec directive. The traditional way to write this would be:

%left '%'            /* left-associative infix operator */
%nonassoc POSTFIX    /* postfix operations are higher precedence */
%token VAL
%%

expr: expr '%' expr
    | expr '%' %prec POSTFIX
    | VAL
    ;

using precedence to solve both the associative ambiguity of infix-% and the precedence ambiguity between infix and postfix.

To solve it without the precedence rules, you need something like:

%token S_NUM O_PERCENT
%%

default: mod_term ;

mod_term: _mod_value
    | _mod_value O_PERCENT mod_term ;

_mod_value: _mod_value O_PERCENT ;
    | S_NUM
    ;

which makes the infix-% right associative instead of left associative. Unfortunately I see no way of solving this without using precedence rules that also makes infix-% left associative. This is due to the fact that you can't decide whether a given '%' token is infix or postfix until you see the token after it, so the non-terminal before the '%' needs to be the same for both rules (_mod_value here or expr in the %prec code)

1
votes

If you are willing to accept an increase in complexity for your parser, you could always turn it into a GLR parser by adding the %glr-parser directive. This will mean the parser will split and explore both states when it reaches the point of the conflict, and then it will remove whichever state fails to parse once it has processed enough tokens. This does require a sufficiently new version of bison, though. Like other people suggest, however, it may be better to redesign the language. Using a GLR parser would mean that you would probably end up using an exponential amount of memory in the number of percents you have in a row, given Bison's existing limitations on GLR parsers.