7
votes

I'm currently trying to write a small compiler using Flex+Bison but I'm kinda of lost in terms of what to do with error handlling, specially how to make everything fit together. To motivate the discussion consider the following lexer fragment I'm using for string literals:

["]          { BEGIN(STRING_LITERAL); init_string_buffer(); }
<STRING_LITERAL>{
    \\\\    { add_char_to_buffer('\\'); }
    \\\"    { add_char_to_buffer('\"'); }
    \\.     { /*Invalid escape. How do I treat this error?*/ }
    ["]     { BEGIN(INITIAL); yylval = get_string_buffer(); return TK_STRING; }
}

How do I handle the situation with invalid escapes? Right now I'm just printing an error message and calling exit but I'd prefer to be able to keep going and detect more than one error per file if possible.

My questions:

  • What function do I use to print out error messages? The same yyerror expected by bison later on? Where do I put the definition of yyerror if I have separate files for the lexer and parser?
  • What token code should I return from my action? 0 for "end of file"? Some special TK_INVALID_STRING token?
  • How do I make sure the parser can continue parsing after lexical errors (invalid literals, stray punctuation characters, etc)?
3

3 Answers

10
votes

There are lots of options. Which one is best is probably a matter of opinion. (And note that SO does not take kindly to questions whose answers are opinions rather than facts.)

It largely depends on how you handle error messages in your application in general. But here are a couple of possibilities:

  1. Print an error message directly from the lexer. Tell you error-detection system that compilation was unsuccessful: you might use a global error count (yuk, globals!), or a shared data-structure passed to yylex as an additional parameter. Then just ignore the character and continue lexing.

  2. Return something like TK_INVALID_STRING to the parser. The parser will need to have appropriate error productions in order to handle and recover from this error appropriately, which is a lot more work but has the advantage of putting all error handling into the parser. However, in the particular case of strings, you'll probably want to finish lexing the string up to the closing quote; otherwise, continuing the parse will be fruitless.

As to yyerror: there is nothing magical about yyerror. That function is completely your responsibility. The only thing that bison does is call it with a specified set of arguments. If you find it useful for recording errors noticed in the lexer (and I think it probably is), then go ahead and use it. You're totally responsible for declaring yyerror, so put its definition in whatever shared header file you #include in both the lexer and the parser. Or fiddle around with bison code generation options to get the definition included in the header file created with bison. Whatever is easier. Once you've figured out how to declare yyerror, you can define it anywhere you want: in the lexer file, in the bison file, or (my preference) in a separate library of support functions.

(FWIW, I've tried option 2, and it really seems to me like too much work; option 1 has worked fine for me. But tastes vary, and YMMV; I'm not going to defend my choice here, but I don't mind admitting to it.)

2
votes

The simplest thing is to just have a final rule

. return yytext[0];

This covers all the single special characters and all the illegal ones as well. Use special characters directly in your grammar, as ":", ";", etc. Then if you get an illegal character the parser's error-handling is invoked, which gives some prospect of recovery. If you handle them in the lexer all you can do is print an error and ignore them.

It also cuts down the size of your lex file.

2
votes

If you are using Bison with C++ output, another option is throwing an exception.

.   throw yy::parser::syntax_error("invalid character: " + std::string(yytext, yyleng);