3
votes

I'm writing my own scripting language using flex and bison. I have a grammar and I'm able to generate a parser which works fine with a correct script. I would like to be able to add also some meaningful error message for special error situations. For example I would like to be able to recognize unmatched parenthesis for a block of statements or a missing semicolon and so on. Suppose I have these statements (here the grammar is not complete):

...
statements: statement SEMICOLON statements
    | statement SEMICOLON;

statement: ifStatement
    | whileStatement
    ;

ifStatement: IF expression THEN statements END
    | IF expression THEN statements ELSE statements END
    ;

whileStatement:  DO statements WHILE expression END
    ;
...

I would like to be able to print messages such as "Missing semicolon" or "Missing then keyword" and so on. Should I modify my grammar to enable error handling? Or is there some Bison feature to do this?

1
Thanks for your advice -_-Salvatore

1 Answers

3
votes

Bison is not the proper tool to generate custom error messages, yet its standard error messages are not too bad either, provided you enable %error-verbose. Have a look at the documentation: http://www.gnu.org/software/bison/manual/bison.html#Error-Reporting.

If you really want to provide custom error message, do read the documentation about YYERROR, and generate rules for the patterns you want to catch, and raise errors yourself. For instance, here dividing by 0 is treated as a syntax error (which is dubious, but provides an example of custom syntax error messages).

 exp:
   NUM           { $$ = $1; }
 | exp '+' exp   { $$ = $1 + $3; }
 | exp '-' exp   { $$ = $1 - $3; }
 | exp '*' exp   { $$ = $1 * $3; }
 | exp '/' exp
     {
       if ($3)
         $$ = $1 / $3;
       else
         {
           $$ = 1;
           fprintf (stderr, "%d.%d-%d.%d: division by zero",
                    @3.first_line, @3.first_column,
                    @3.last_line, @3.last_column);
         }
     }

Note also that providing strings for tokens generates better error messages:

%token NUM

would generate unexpected NUM, while

%token NUM "number"

would generate unexpected number.