Antlr4 (CSharp Target), Grammar = Java.g4 : Generated JavaLexer.cs Does Not Compile

Question

I am using Antlr4cs-4.3.0 with Visual Studio 2012, .Net 4.5. I have successfully generated and exercised a parser produced from a simple grammar (calculator.g4) and thus feel that I have things properly setup in Visual Studio. I am now attempting to generate a parser for the Java.g4 grammar which I obtained from github.com/antlr/grammars-v4/java. The JavaLexer.cs file which is generated does not compile (see code and errors below) - persumably because it contains references to things which exist only in a java environment.

Any advise will be much appreciated.

Robert

private bool JavaLetterOrDigit_sempred(RuleContext _localctx, int predIndex) {
    switch (predIndex) {
        case 2: return Character.isJavaIdentifierPart(_input.LA(-1));
        case 3: return Character.isJavaIdentifierPart(Character.toCodePoint((char)_input.LA(-2),
                                                      (char)_input.LA(-1)));
    }
    return true;
}

Error: The name 'Character' does not exist in the current context

Error: 'Antlr4.Runtime.ICharStream' does not contain a definition for 'LA' and no extension method 'LA' accepting a first argument of type 'Antlr4.Runtime.ICharStream' could be found (are you missing a using directive or an assembly reference?)

Bart Kiers Bart Kiers · Accepted Answer · 2014-07-18T19:07:15

That grammar contain Java code. It's only used in the following rules:

fragment
JavaLetter
    :   [a-zA-Z$_] // these are the "java letters" below 0xFF
    |   // covers all characters above 0xFF which are not a surrogate
        ~[\u0000-\u00FF\uD800-\uDBFF]
        {Character.isJavaIdentifierStart(_input.LA(-1))}?
    |   // covers UTF-16 surrogate pairs encodings for U+10000 to U+10FFFF
        [\uD800-\uDBFF] [\uDC00-\uDFFF]
        {Character.isJavaIdentifierStart(Character.toCodePoint((char)_input.LA(-2), (char)_input.LA(-1)))}?
    ;

fragment
JavaLetterOrDigit
    :   [a-zA-Z0-9$_] // these are the "java letters or digits" below 0xFF
    |   // covers all characters above 0xFF which are not a surrogate
        ~[\u0000-\u00FF\uD800-\uDBFF]
        {Character.isJavaIdentifierPart(_input.LA(-1))}?
    |   // covers UTF-16 surrogate pairs encodings for U+10000 to U+10FFFF
        [\uD800-\uDBFF] [\uDC00-\uDFFF]
        {Character.isJavaIdentifierPart(Character.toCodePoint((char)_input.LA(-2), (char)_input.LA(-1)))}?
    ;

Either remove the {...} parts from it:

fragment
JavaLetter
    :   [a-zA-Z$_] // these are the "java letters" below 0xFF
    |   // covers all characters above 0xFF which are not a surrogate
        ~[\u0000-\u00FF\uD800-\uDBFF]
    |   // covers UTF-16 surrogate pairs encodings for U+10000 to U+10FFFF
        [\uD800-\uDBFF] [\uDC00-\uDFFF]
    ;

fragment
JavaLetterOrDigit
    :   [a-zA-Z0-9$_] // these are the "java letters or digits" below 0xFF
    |   // covers all characters above 0xFF which are not a surrogate
        ~[\u0000-\u00FF\uD800-\uDBFF]
    |   // covers UTF-16 surrogate pairs encodings for U+10000 to U+10FFFF
        [\uD800-\uDBFF] [\uDC00-\uDFFF]
    ;

and (optionally) perform these checks at a later stage, or replace the Java code with C# code.

Antlr4 (CSharp Target), Grammar = Java.g4 : Generated JavaLexer.cs Does Not Compile

2 Answers