3
votes

I have a grammar which uses parentheses and square brackets as delimiters. When the parser generated by bison is given input with unbalanced delimiters, the error location in the YYLTYPE* passed to yyerror is the end of the input. So, for example, on input xx(yy, within void yyerror(YYLTYPE* yylloc, Context* ctx, const char* msg) I have that yylloc->first_column == yylloc->last_column == 5. Unfortunately, the end of the input is not the most salient location for indicating an unmatched delimiter. Much more useful would be the position of the left parenthesis or left square bracket for which there was no match. (In the example, that would be the left parenthesis at offset 2.) I gather that that this information is available in the parse stack---there has to be some n such that $-n is the unmatched ( or [ token and @-n is the YYLTYPE struct holding its position---but none of that appears to be available from yyerror. I'm aware that I could keep a stack of my own for tracking the offsets of delimiters and stash that in the Context I'm already passing to yyerror, but that seems inelegant and duplicative, as bison has already to be tracking this.

So: How can one prise out of bison the position of the first unbalanced delimiter it encounters in input, so that this is available when producing the message for a parse error?

2
Could you share your grammar (at least the part of it that is handling parens) ?Josh
It's here. Parentheses are handled at line 187.uckelman

2 Answers

6
votes

You should be able to add either the rule:

atom: '(' error    { /* unmatched left paren at @1 */ }

or

atom: '(' alt error    { /* unmatched left paren at @1 */ }

to get info about unmatched left parentheses. The difference being that the first rule will match an unmatched parenthesis that is not followed by anything parseable (such as at the end of the input), while the second will only match if it is followed by something that looks like a valid alt.

There's possibly an issue if you have any other error productions in your grammar (which you don't in the linked grammar), in which case a different error production might be triggered first. Since there are no other error rules, the first alternative is better.

Note that using yacc/bison error recovery rules does not in any way suppress the syntax error, it just runs some code AFTER a syntax error to attempt to recover. That code can print additional error messages and then abort rather than trying to recover, but those messages will be printed after the syntax error message.

1
votes

Check out the error token in bison: http://www.gnu.org/software/bison/manual/html_node/Error-Recovery.html#Error-Recovery

I imagine something like this would work (I have not tested):

atom:  '(' alt ')'        { $$ = $2; }
    |  '(' alt error '\n' { /* this handles an extra left paren */ }
    |   error alt ')'     { /* this handles an extra right paren */ }
    |   literal
    ;