5
votes

I am using antlr 4.5 to build a parser for a language with several special comment formats, which I would like to stream to different channels.

It seems antlr 4.5 has been extended with a new construct for declaring extra lexer channels:

extract from doc https://theantlrguy.atlassian.net/wiki/display/ANTLR4/Lexer+Rules

As of 4.5, you can also define channel names like you enumerations with the following construct above the lexer rules:

channels { WSCHANNEL, MYHIDDEN }

My lexing and parsing rules are in a single file, and my code looks like this:

    channels {
       ANNOT_CHANNEL,
       FORMAL_SPEC_CHANNEL,
       DOC_CHANNEL,
       COMMENT_CHANNEL,
       PRAGMAS_CHANNEL
    }

... parsing rules ...

// expression annotation (sent to a special channel)
    ANNOT: (EOL_ANNOT | LUS_ANNOT | C_ANNOT) -> channel(ANNOT_CHANNEL) ;
    fragment LUS_ANNOT: '(*!' ( COMMENT | . )*? '*)' ;
    fragment C_ANNOT: '/*!' ( COMMENT | . )*? '*/' ;
    fragment EOL_ANNOT: ('--!' | '//!') .*? EOL ;

    // formal specification annotations (sent to a special channel)
    FORMAL_SPEC: (EOL_SPEC | LUS_SPEC | C_SPEC ) -> channel(FORMAL_SPEC_CHANNEL) ;
    fragment LUS_SPEC: '(*@' ( COMMENT | . )*? '*)' ;
    fragment C_SPEC: '/*@' ( COMMENT | . )*? '*/' ;
    fragment EOL_SPEC: ('--@' | '//@' | '--%') .*? EOL;

    // documentation annotation (sent to a special channel)
    DOC: ( EOL_DOC |LUS_DOC | C_DOC ) -> channel(DOC_CHANNEL);
    fragment LUS_DOC: '(**' ( COMMENT | . )*? '*)' ;
    fragment C_DOC: '/**' ( COMMENT | . )*? '*/' ;
    fragment EOL_DOC: ('--*' | '//*') .*? EOL;

    // standard comment (sent to a special channel)
    COMMENT: ( EOL_COMMENT | LUS_COMMENT | C_COMMENT ) -> channel(COMMENT_CHANNEL);
    fragment LUS_COMMENT: '(*' ( COMMENT | . )*? '*)' ;
    fragment C_COMMENT: '/*' ( COMMENT |. )*? '*/' ;
    fragment EOL_COMMENT: ('--' | '//') .*? EOL;

    // pragmas are sent to a special channel
    PRAGMA: '#pragma' CHARACTER* '#end' -> channel(PRAGMAS_CHANNEL);

however I am still getting this 4.4-like error

warning(155): Scade6.g4:550:52: rule ANNOT contains a lexer command with an unrecognized constant value; lexer interpreters may produce incorrect output
warning(155): Scade6.g4:556:56: rule FORMAL_SPEC contains a lexer command with an unrecognized constant value; lexer interpreters may produce incorrect output
warning(155): Scade6.g4:562:45: rule DOC contains a lexer command with an unrecognized constant value; lexer interpreters may produce incorrect output
warning(155): Scade6.g4:568:62: rule COMMENT contains a lexer command with an unrecognized constant value; lexer interpreters may produce incorrect output
warning(155): Scade6.g4:574:47: rule PRAGMA contains a lexer command with an unrecognized constant value; lexer interpreters may produce incorrect output

If I split lexer and parser in two distinct files and use an import statement to import the lexer in the parser I still get the same error as above,

Using integer constants instead of names with a combined grammar

-> channel(10000)

yields the following error

error(164): Scade6.g4:8:0: custom channels are not supported in combined grammars

If I split lexer and parser apart in two files and use integer constants no warning, however it is not really satisfactory for readability.

Is there anything I can do to have extra channels named properly? (with either combined or separate lexer/parser specs, no preference)

Regards,

1

1 Answers

0
votes

Is there anything I can do to have extra channels named properly?

not sure about v4.5 (have not used it), but in v4.x you could always define channels like so (assuming using java):

grammar MyGrammar;

@lexer::members {
    public static final int WHITESPACE = 1;
    public static final int COMMENTS = 2;
}

...the rest of your grammar goes here...

WS  :   [ \t\n\r]+ -> channel(WHITESPACE) ;  // channel(1)

SL_COMMENT
    :   '//' .*? '\n' -> channel(COMMENTS)   // channel(2)
    ;

If you do not already have "The Definitive ANTLR 4 Reference" book I recommend getting hold of it. Will save you a lot of time. Example above is from that book.