How do you add imaginary tokens for a separated ANTLR lexer & parser?

Question

I'm building an AST using ANTLR and based on the separated Java6 lexer & grammar. The lexer definition is contained in Java6Lex.g and produces tokens the grammar consumes. The parser consumes these no problem, but as I produce the AST, I would like to introduce imaginary tokens - however, it seems that ANTLR doesn't like the model.

The parser grammar includes the token vocabulary from the lexer - which should baseline the tokens available to the grammar.

parser grammar Java6Parse;

options {
    tokenVocab=Java6Lex;
    backtrack=true;
    memoize=true;
    output=AST;
    language = CSharp3;
}

Now let's say, I want to take fieldDeclaration and turn it into a rooted node using a rewrite rule. I assumed (clearly wrongly) that I could introduce the imaginary token directly into the parser grammar as follows:

fieldDeclaration
    :   modifiers type variableDeclarator (COMMA variableDeclarator)* SEMI
            -> ^(FIELD modifiers type variableDeclarator+)
    ;

However, this simply results in the following error occurring:

reference to undefined token in rewrite rule: FIELD

No problem, I get that, I didn't define it. So, I try to define it in the tokens section in the parser grammar. Again, thinking wrongly, that the tokenVocab should provide a baseline.

tokens { FIELD; }

Nope, seems that even defining an tokens block results in an EarlyExitException and an error indicating that Java6Parse.g has no rules. I figured, the parser grammar simply doesn't like tokens being defined in the parser. So, I defined it in the lexer. Again, that failed. Then I defined every token in both the lexer and parser - again, failure.

So, here's what I need to know. Is there a way to define an imaginary token when the lexer and parser are separated and if so how. If not, is the only option to combine the grammar and lexer back into the same file?

Sam Harwell Sam Harwell · Accepted Answer · 2013-03-25T15:02:49

You are most likely including the tokens{} block in the wrong location. ANTLR 3 requires the grammar header elements appear in a particular order. See this Stack Overflow answer for the correct order:

Using @header in ANTLR

How do you add imaginary tokens for a separated ANTLR lexer & parser?

1 Answers