I have a scannerless parser grammar utilizing the CharsAsTokens faux lexer which generates a usable Java Parser class for ANTLR4 versions through 4.6. But when updating to ANTLR 4.7.2 through 4.9.3-SNAPSHOT, the tool generates code producing dozens of compilation errors from the same grammar file, as detailed below.
My question here is simply: Are scannerless parser grammars no longer supported, or must their character-based terminals be specified differently in 4.7 and beyond?
Update:
Unfortunately, I cannot post my complete grammar here as it is derived from FOUO security marking guidance, access to which is retricted by the U.S. government (I am a DoD/IC contractor).
The incompatible upgrade issue however is entirely reproducible with the CSQL.g4 scannerless parser grammar example referred to by Ter in Section 5.6 of The Definitive ANTLR 4 Reference.
As does my grammar, the CSQL example uses CharsAsTokens.java for its tokenizer, and CharVocab.tokens as its token vocabulary.
Note that every token name is specified by its ASCII character-literal equivalent, as in:
'\*'=42
'+'=43
and that the parser grammar references quoted token names directly within its rules, as in:
star: '*' ws? ;
plus: '+' ws? ;
The issue here is that using ANTLR4 versions 4.2 though 4.6 generated compilable parser classes from such grammars, while ANTLR v4.7.2 and beyond generate Java code with numerous errors.
Here is a snippit from the usable CSQL Java class definition generated by ANTLR v4.6:
public static class ArgsContext extends ParserRuleContext {
public List<ArgContext> arg() {
return getRuleContexts(ArgContext.class);
}
public ArgContext arg(int i) {
return getRuleContext(ArgContext.class,i);
}
public ArgsContext(ParserRuleContext parent, int invokingState) {
super(parent, invokingState);
}
@Override public int getRuleIndex() { return RULE_args; }
@Override
public void enterRule(ParseTreeListener listener) {
if ( listener instanceof CSQLListener ) ((CSQLListener)listener).enterArgs(this);
}
@Override
public void exitRule(ParseTreeListener listener) {
if ( listener instanceof CSQLListener ) ((CSQLListener)listener).exitArgs(this);
}
}
And here is the corresponding but now broken code generated by ANTLR v4.7.2:
public static class ArgsContext extends ParserRuleContext {
public List<ArgContext> arg() {
return getRuleContexts(ArgContext.class);
}
public ArgContext arg(int i) {
return getRuleContext(ArgContext.class,i);
}
public List<TerminalNode> ','() { return getTokens(CSQL.','); } // line 446
public TerminalNode ','(int i) { // line 447
return getToken(CSQL.',', i); // line 448
} // line 449
public ArgsContext(ParserRuleContext parent, int invokingState) {
super(parent, invokingState);
}
@Override public int getRuleIndex() { return RULE_args; }
@Override
public void enterRule(ParseTreeListener listener) {
if ( listener instanceof CSQLListener ) ((CSQLListener)listener).enterArgs(this);
}
@Override
public void exitRule(ParseTreeListener listener) {
if ( listener instanceof CSQLListener ) ((CSQLListener)listener).exitArgs(this);
}
}
The numbered lines above are generated only by the newer ANTLR tools (without the added comments), and when compiled result in:
Syntax error on token "','", Identifier expected CSQL.java /CSQL/generated-sources line 446 Java Problem
Syntax error on token "','", delete this token CSQL.java /CSQL/generated-sources line 447 Java Problem
CSQL cannot be resolved to a variable CSQL.java /CSQL/generated-sources line 448 Java Problem
Syntax error on token ".", , expected CSQL.java /CSQL/generated-sources line 448 Java Problem
So why the backwards-incompatible change in ANTLR v4.7+, and how best should I work around it?
CharAsTokens
is a simplified token source, but still a token source (and hence, a scanner). Your problem is probably not about support for "scannerless parsers" (whatever that means in this context), but some changes in newer versions of ANTLR that produce syntax errors in the generated code. So please give us examples of the errors and the relevant parts of your grammar. - Mike Lischke' '=1 '\n'=2 '\r'=3 ...
and parser fileparser grammar ArithmeticParser; options { tokenVocab = ArithmeticLexer; } ws: ( ' ' | '\r' | '\n' )+;
. You're going to have to use the token name instead of the literals (ws: (SP | CR | LF);
), then you can use 4.9.2. Ideally, you just have a lexer grammar to declare all this, but just don't use the lexer. Then my trfoldlit of Trash can make the changes to your parser grammar automatically. Or just do it by hand. - kaby76