So I'm writing a compiler in Java using ANTLR, and I'm a little puzzled by how it deals with errors.
The default behavior seems to be to print an error message and then attempt, by means of token insertion and such, to recover from the error and continue parsing. I like this in principle; it means that (in the best case) if the user has committed more than one syntax error, they'll get one message per error, but it'll mention all the errors instead of forcing them to recompile to discover the next one. The default error message is fine for my purposes. The trouble comes when it's done reading all the tokens.
I am, of course, using ANTLR's tree constructors to build abstract syntax trees. While it's nice for the parse to continue through syntax errors so the user can see all the errors, once it's done parsing I want to get an exception or some kind of indication that the input wasn't syntactically valid; that way I can stop the compilation and tell the user "sorry, fix your syntax errors and then try again". What I don't want is for it to spit out an incomplete AST based on what it thinks the user was trying to say, and continue to the next phase of compilation with no indication that anything went wrong (other than the error messages which went to the console and I can't see). Yet by default, it does exactly that.
The Definitive ANTLR Reference offers a technique to stop parsing as soon as a syntax error is detected: override the mismatch
and recoverFromMismatchedSet
methods to throw RecognitionException
s, and add a @rulecatch
action to do the same. This would seem to lose the benefit of recovering from parse errors, but more importantly, it only partially works. If a necessary token is missing (for instance, if a binary operator only has an expression on one side of it), it throws an exception just as expected, but if an extraneous token is added, ANTLR inserts the token that it thinks belongs there and continues on its merry way, producing an AST with no indication of a syntax error except a console message. (To make matters worse, the token it inserted was EOF
, so the rest of the file didn't even get parsed.)
I'm sure I could fix this by, say, adding something like an isValid
field to the parser and overriding methods and adding actions so that, at the end of the parse, it throws an exception if there were any errors. But is there a better way? I can't imagine that what I'm trying to do is unusual among ANTLR users.