1
votes

For example, I define several lexer rules in my Grammar:

INT: 'int';
FLOAT: 'float';
...

DIGIT : [0-9];
NUMERIC : (DIGIT+ | DIGIT+ '.' DIGIT+ | '.' DIGIT+ | DIGIT+ '.');
...

I need to somehow mark keywords ('int', 'float', and some other), that when I get tokens by using TokenStream I can filter them by some custom sign.

It is possible?

Right now I see only one way - unite necessary lexers into some rule.

Update

I try to apply the first option of the first answer below, but get the next problems: I get an error: 'TOKENNAME is not a recognized token name'

For this case was an issue. I apply recommendations from here:

use

options { tokenVocab = MyLexer; }

instead of

import MyLexer;

and get the error: 'error(114): MyParser.g4:3:23: cannot find tokens file .\MyLexer.tokens'

Here says, how I understand, that it's may happen when ANTLR source files (MyParser.g4, MyLexer.g4) is placed in the same directory where placed generated package. But I set a property of output file to another directory. Maybe I get some miss understanding...

Here is a small example.

1
Keywords have their own lexer id, which should be enough to identify them reliably. Why do you need another way?Mike Lischke
I want to split received terminals by groups to apply different syntax backlight in VS Language Extension. And will be nice to define some group key in description lexer in grammar, if it is possible, of course.Andrei

1 Answers

0
votes

Depending on what else you are using the lexer for there are 2 avenues you can explore.

  1. The type() lexer command to remap tokens.

    Taking the example from the docs there:

    lexer grammar SetType;
    tokens { STRING }
    DOUBLE : '"' .*? '"'   -> type(STRING) ;
    SINGLE : '\'' .*? '\'' -> type(STRING) ;
    WS     : [ \r\t\n]+    -> skip ;
    

    This would allow multiple rules for the single type STRING which is the token type you would receive in your stream.

  2. The channel() command which you can use to mark and filter the tokens once you have the token stream. This has the benefit of retaining the original lexer stream if you still need to parse afterwards.

    Again, stealing the example from the antlr docs:

    BLOCK_COMMENT
        : '/*' .*? '*/' -> channel(HIDDEN)
        ;
    LINE_COMMENT
        : '//' ~[\r\n]* -> channel(HIDDEN)
        ;