1
votes

I have created an example grammar TestGrammar.g4 to illustrate my problem:

grammar TestGrammar;

Member_of : 'MEMBER OF';

Identifier : [a-z]+
           ;

WS: [ \n\t\r]+ -> channel(HIDDEN);

parseSimpleExpression : (Identifier '.')+ Identifier
                      ;

collection_member_expression : parseSimpleExpression Member_of parseSimpleExpression
                            ;

Following code shows the invocation of the parser:

String expression = "x.a MEMBER OF y.a";
TestGrammarLexer l = new  TestGrammarLexer(new ANTLRInputStream(expression));
CommonTokenStream tokens = new CommonTokenStream(l);
TestGrammarParser p = new TestGrammarParser(tokens);
p.setErrorHandler(new BailErrorStrategy());
ParserRuleContext ctx = p.parseSimpleExpression();

My expectation is that ANTLR throws a syntax error on the input x.a MEMBER OF y.a, however it does not. Instead it consumes only part of the input (x.a MEMBER OF) and finishes successfully.

Now, when I remove the last rule from the grammar, ANTLR throws a syntax error as expected. I don't understand this behaviour since the last grammar rule should not even be involved in the parsing since it is neither directly nor indirectly referenced by the start rule.

1

1 Answers

2
votes

ANTLR isn't supposed to throw an exception. With the code:

String expression = "x.a MEMBER OF y.a";
TestGrammarLexer l = new  TestGrammarLexer(new ANTLRInputStream(expression));
CommonTokenStream tokens = new CommonTokenStream(l);
TestGrammarParser p = new TestGrammarParser(tokens);
p.setErrorHandler(new BailErrorStrategy());
ParserRuleContext ctx = p.parseSimpleExpression();

you're simply instructing ANTLR to parse (Identifier '.')+ Identifier, which it happily does since "x.z" is the first part of the input and matches the parseSimpleExpression production.

The fact that there are more tokens left in the token stream does not matter. If you want to make sure the entire input is consumed, you will need to anchor your rule with the EOF token:

parseSimpleExpression : (Identifier '.')+ Identifier EOF
                      ;

Moritz Becker wrote

... however it does not. Instead it consumes only part of the input (x.a MEMBER OF) and ...

No, that is not true, it only consumes "x.a" as you can verify yourself by printing what the context has matched:

System.out.println(ctx.getText()); // prints "x.a"

And if I remove the collection_member_expression rule from the grammar and regenerate the parser classes and rerun the the code above, it behaves exactly the same.