I'm facing a somewhat strange behavior in error recovery when using semantic predicates.
I need error recovery (specifically the single token insertion) for the texts I'll be parsing have many "single missing token" errors.
I also need semantic predicates because for something like ANTLR4: Matching all input alternatives exaclty once (the second alternative).
But it seems the two don't mix well (I've previously seen this and asked SO for help: ANTLR4 DefaultErrorStrategy fails to inject missing token; then I found an answer; now I don't).
Let the grammar (so simple, it matches any number of "A", separated with spaces, terminated by semi-colon):
grammar AAAGrammar;
WS : ' '+ -> channel(HIDDEN);
A : 'A';
SEMICOLON : ';';
aaaaa :
a* ';'
;
a :
A
;
Running over the following inputs, the resulting parse tree is:
- "A A A;":
(aaaaa (a A) (a A) (a A) ;)
; - "A A A" :
(aaaaa (a A) (a A) (a A) <missing ';'>)
(this one emits a warning on stderr: line 1:5 missing ';' at '').
That's what I expect, and exactly what I want (the missing semi-colon of the 2nd input was correctly injected).
Now a simple change to grammar, introducing a semantic predicate (this one innocuous, but I understand that ANTLR4 does not - and should not - evaluate this) in the "a" rule, to make it:
a :
{true}? A
;
Running it again over the same inputs:
- "A A A;": (aaaaa (a A) (a A) (a A) ;)
;
- "A A A" : (aaaaa (a A) (a A) (a A))
(this one also emits a warning on stderr: line 1:5 no viable alternative at input '').
So semantic predicates totally messed up with missing token injection.
Is this expected?
Why?
Is there any ANTLR4 grammar trick to get error recovery back, without removing the sempred?
Edit: (in reply to @CoronA comment)
Here the diff -u
between generated parsers (without and with semantic predicate):
--- withoutsempred.java 2015-05-04 09:39:22.644069398 -0300
+++ withsempred.java 2015-05-04 09:39:13.400046354 -0300
@@ -56,22 +56,24 @@
public final AaaaaContext aaaaa() throws RecognitionException {
AaaaaContext _localctx = new AaaaaContext(_ctx, getState());
enterRule(_localctx, 0, RULE_aaaaa);
- int _la;
try {
+ int _alt;
enterOuterAlt(_localctx, 1);
{
setState(7);
_errHandler.sync(this);
- _la = _input.LA(1);
- while (_la==A) {
- {
- {
- setState(4); a();
- }
+ _alt = getInterpreter().adaptivePredict(_input,0,_ctx);
+ while ( _alt!=2 && _alt!=-1 ) {
+ if ( _alt==1 ) {
+ {
+ {
+ setState(4); a();
+ }
+ }
}
setState(9);
_errHandler.sync(this);
- _la = _input.LA(1);
+ _alt = getInterpreter().adaptivePredict(_input,0,_ctx);
}
setState(10); match(SEMICOLON);
}
@@ -101,7 +103,9 @@
try {
enterOuterAlt(_localctx, 1);
{
- setState(12); match(A);
+ setState(12);
+ if (!( true )) throw new FailedPredicateException(this, " true ");
+ setState(13); match(A);
}
}
catch (RecognitionException re) {
@@ -115,12 +119,25 @@
return _localctx;
}
+ public boolean sempred(RuleContext _localctx, int ruleIndex, int predIndex) {
+ switch (ruleIndex) {
+ case 1: return a_sempred((AContext)_localctx, predIndex);
+ }
+ return true;
+ }
+ private boolean a_sempred(AContext _localctx, int predIndex) {
+ switch (predIndex) {
+ case 0: return true ;
+ }
+ return true;
+ }
+
public static final String _serializedATN =
- "\3\uacf5\uee8c\u4f5d\u8b0d\u4a45\u78bd\u1b2f\u3378\3\5\21\4\2\t\2\4\3"+
- "\t\3\3\2\7\2\b\n\2\f\2\16\2\13\13\2\3\2\3\2\3\3\3\3\3\3\2\4\2\4\2\2\17"+
- "\2\t\3\2\2\2\4\16\3\2\2\2\6\b\5\4\3\2\7\6\3\2\2\2\b\13\3\2\2\2\t\7\3\2"+
- "\2\2\t\n\3\2\2\2\n\f\3\2\2\2\13\t\3\2\2\2\f\r\7\5\2\2\r\3\3\2\2\2\16\17"+
- "\7\4\2\2\17\5\3\2\2\2\3\t";
+ "\3\uacf5\uee8c\u4f5d\u8b0d\u4a45\u78bd\u1b2f\u3378\3\5\22\4\2\t\2\4\3"+
+ "\t\3\3\2\7\2\b\n\2\f\2\16\2\13\13\2\3\2\3\2\3\3\3\3\3\3\3\3\2\4\2\4\2"+
+ "\2\20\2\t\3\2\2\2\4\16\3\2\2\2\6\b\5\4\3\2\7\6\3\2\2\2\b\13\3\2\2\2\t"+
+ "\7\3\2\2\2\t\n\3\2\2\2\n\f\3\2\2\2\13\t\3\2\2\2\f\r\7\5\2\2\r\3\3\2\2"+
+ "\2\16\17\6\3\2\2\17\20\7\4\2\2\20\5\3\2\2\2\3\t";
public static final ATN _ATN =
ATNSimulator.deserialize(_serializedATN.toCharArray());
static {
I've debugged both codes.
Supposing an input "A A A" (without semi-colon), the version without semantic predicate goes
while (_la==A) {
{
{
setState(4); a();
}
}
setState(9);
_errHandler.sync(this);
_la = _input.LA(1);
}
This block 3 times, then proceeds to
setState(10); match(SEMICOLON);
The match(SEMICOLON)
injects a missing token.
Now note that the version with semantic predicate gets rid of _la = _input.LA(1)
(lookahead) and switches to a more advanced prediction with _alt = getInterpreter().adaptivePredict(_input,0,_ctx)
.
With the very same input, the version with semantic predicate goes:
_alt = getInterpreter().adaptivePredict(_input,0,_ctx);
while ( _alt!=2 && _alt!=-1 ) {
if ( _alt==1 ) {
{
{
setState(4); a();
}
}
}
setState(9);
_errHandler.sync(this);
_alt = getInterpreter().adaptivePredict(_input,0,_ctx);
}
This block 3 times, but it doesn't leave the block unexceptionally. The last _alt = getInterpreter().adaptivePredict(_input,0,_ctx)
throws org.antlr.v4.runtime.NoViableAltException
, skipping the match(SEMICOLON)
entirely.
match(SEMICOLON)
doesn't get a chance to execute, since theadaptativePredict()
throws an exception when there is no viable alternative. I think the interpreter is waiting for either "A" or ";" to decide which way to go; but the simpler lookahead(1) strategy doesn't choke on the missing ';' and lets itsmatch()
proceed normally (thus injecting the missing token). – rslemos