0
votes

We know Antlr4 is using the sync-and-return recovery mechanism. For example, I have the following simple grammar:

grammar Hello;
r  : prefix body ;
prefix: 'hello' ':';
body: INT ID ;
INT: [0-9]+ ;
ID : [a-z]+ ; 
WS : [ \t\r\n]+ -> skip ;

I use the following listener to grab the input:

public class HelloLoader extends HelloBaseListener {
    String input;
    public void exitR(HelloParser.RContext ctx) {
        input = ctx.getText();
    }
}

The main method in my HelloRunner looks like this:

public static void main(String[] args) throws IOException {
    CharStream input = CharStreams.fromStream(System.in);
    HelloLexer lexer = new HelloLexer(input);
    CommonTokenStream tokens = new CommonTokenStream(lexer);
    HelloParser parser = new HelloParser(tokens);
    ParseTree tree = parser.r();
    ParseTreeWalker walker = new ParseTreeWalker();
    HelloLoader loader = new HelloLoader();
    walker.walk(loader, tree);
    System.out.println(loader.input); 
}

Now if I enter a correct input "hello : 1 morning", I will get hello:1morning, as expected.

What if an incorrect input "hello ; 1 morning"? I will get the following output:

line 1:6 token recognition error at: ';'
line 1:8 missing ':' at '1'
hello<missing ':'>1morning

It seems that Antlr4 automatically recognized a wrong token ";" and delete it; however, it will not smartly add ":" in the corresponding place, but just claim <missing ':'>.

My question is: is there some way to solve this problem so that when Antlr found an error it will automatically fix it? How to achieve this coding? Do we need other tools?

1
I think you could try to use a custom error recovery mechanism that extends the existing one... I'm pretty sure ANTLR lets you do thisRaven

1 Answers

0
votes

Typically the input for a parser comes from some source file that contains some code or text that (supposedly) conforms to some grammar. A typical use scenario for syntax errors is to alert the user so that the source file can be corrected.

As the commented noted, you can insert your own error recovery system, but before trying to insert a single token into the token stream and recover, please consider that it would be a very limited solution. Why? Consider a much richer grammar where for a given token, many -- perhaps dozens or hundreds -- of other tokens can legally follow it. How would a single-token replacement strategy work then?

The hello.g4 example is the epitome of a trivial grammar, the "hello world" of ANTLR. But most of the time, for non-trivial grammars, the best we can do with imperfect syntax is to simply alert the user so the syntax can be corrected.