4
votes

I am new to antlr and java so this may be a trivial question (hopefully!). I am using antlr 3.4. I have a grammar for the lexer:

lexer grammar MyLexer;

options {
  language = Java;
}    
COMMENT:
    ( '//' ~('\n'|'\r')* '\r'? '\n'
    | '/*' .* '*/'
    ) {$channel=HIDDEN;};

WS: (' '
     | '\t'
     | '\r'
     | '\n'
     ) {$channel=HIDDEN;};
COLLECTION:    'collection';
BRACE_OPEN:    '{';
BRACE_CLOSE:   '}';

and another for the parser:

parser grammar myParser;

options {
  language = Java;
  tokenVocab = myLexer;
}

collection_def
scope {
    MyCollection currentCollection;
}
@init {
    $collection_def::currentCollection = new MyCollection(); 
}
@after {

    // There should be a comment preceding this rule. How to get the content of that comment into the commentContent variable?
    $collection_def::currentCollection.setDescription(commentContent);

    ...
}
  : COLLECTION BRACE_OPEN
      ...

    BRACE_CLOSE;

The lexer sends comments to the hidden channel. But I want the parser to extract the text contained in the comment that precedes a specific rule (or a specific token, since the COLLECTION token only appears in the rule above). For example, I want this input:

/* Text describing the collection */
collection {
  item 1;
  item 2;
}

to be parsed to a MyCollection object with its description member variable set to "Text describing the collection".

How can I do this?

1
Why ANTLR3 and not the new ANTLR4?Bart Kiers
@BartKiers - This is part of a larger codebase which uses antlr3 and porting it to antlr4 would be additional work. However if you have an answer that is antlr4 specific I'd also like to hear it.Renoa

1 Answers

1
votes

The token stream has all the tokens, included those on the hidden channel. Every token that you get from the parser result (e.g. through tree.getToken() if you're using output = AST) knows its position in the token stream (Token.getTokenIndex()). That's the information you need to be able to locate and read the hidden token(s) preceding your token.

All that's left for you to do is get all this info to the place where you need to use it. One possible way to do this is get the tokens list (via CommonTokenStream.getTokens() if you use a CommonTokenStream between lexer and parser) and pass it to whatever method is doing the processing of the comments, or do some post-processing of the result to add the info to it.