
I'm trying to parse multidimensional arrays with YACC. Here is my lvalue definition:

    lvalue: ID { EM_debug("got lvalue identifier " + to_String($1));
            $$.My_VAR = A_SimpleVar($$.pos, $1);
            $$.size = 0;
            $$.name = $1;
    | lvalue L_SQUARE_BRACKET exp R_SQUARE_BRACKET { EM_debug("got lvalue[exp]");
            $$.My_VAR = A_SubscriptVar($$.pos, $1.My_VAR, $3.My_AST);
            $$.size = $3.My_AST;
            $$.name = $1.name;

For the (simplified) input ia[2] it prints got lvalue identifier ia and gives a parsing error when it encounters the left bracket. I don't get why this would not work. It should see the left bracket in its lookahead and shift. It should not reduce immediately like this. How can I prevent it from shifting?


2 Answers


Don't use YACC for lval vs. rval distinguishing. Because an lval is also almost always an rval, it creates reduce/reduce conflicts in the grammar and that makes it non-deterministic.

Use a Semantic Analysis phase to check for lval correctness rather than incorporating it into the YACC grammar.

For reference though, GNU Bison handles reduce/reduce conflicts by reducing by the rule which is defined first in the file. So that might help you temporarily get around your problem.


On the contrary, the reduction is completely correct. In order to apply


to the input


the parser needs to make ia into an lvalue before shifting the [ (assuming that L_SQUARE_BRACKET is a [, see below). It does this by using the rule lvalue: ID, so we can expect that rule to run before the [ is shifted.

So that's not the problem, and there's not enough information in the question to provide a better diagnosis. However, for what it's worth, a few notes:

1) Personally, I find it much less error-prone and easier to read to use literal characters in bison rules:

lvalue: lvalue '[' exp ']'

which of course needs to be matched with a flex rule which returns the literal characters:

"["|"]"  { return *yytext; }

(or, using the possibly less readable syntax: [][] which can be extended to a longer list of single character tokens, such as [][(){}<>=+*/-]: just remember that ] must come first and - last in a character class).

It's entirely possible that there is a mismatch between your scanner and your parser which results in the [ not being sent with the correct token type; you certainly need to eliminate that possibility for debugging.

2) Is bison telling you about any conflicts (including shift-reduce conflicts)? Each of these needs to be tracked down and eliminated.

3) How do you know that the syntax error is being generated when the [ is seen? Have you, for example, enabled flex debugging traces (very handy for debugging) and/or bison debugging traces (which I find more useful than scattering print statements in your actions, but YMMV)?