0
votes

I want to embed some known identifier names into my grammar e.g. the class names of my project are known and I want to tell the lexer what identifiers are known keywords that actually belongs to the class-name token. But since I have a long list of class names (hundreds of names), I don't want to create a class-name lexer rule by listing all the known class name keywords in the rule, that will make my grammar file too large.

Is it possible to place my keywords into a separate file? One possibility I am thinking about is to place the keywords in a java class that will be subclassed by the generated lexer class. In that case, my lexer's semantic predicate can just call a method in custom lexer superclass to verify if the input token matches my long list of names. And my long list can be placed inside that superclass src code.

However, in the ANTLR4 book it says grammar options 'superClass' for combined grammar only set the parser's superclass. How can I set my lexer's superclass if I still want to use combined grammar. Or is there any other better method to put my long list of keywords into a separate "keyword file".

1

1 Answers

1
votes

If you want each keyword to have its own token type, you can do the following:

  1. Add a tokens{} block to the grammar to create tokens for each keyword. This ensures unique token types are created for each of your keywords.

    tokens {
        Keyword1,
        Keyword2,
        ...
    }
    
  2. Create a separate class MyLanguageKeywords similar to the following:

    private static final Map<String, Integer> KEYWORDS =
        new HashMap<String, Integer>();
    static {
        KEYWORDS.put("keyword1", MyLanguageParser.Keyword1);
        KEYWORDS.put("keyword2", MyLanguageParser.Keyword2);
        ...
    }
    
    public static int getKeywordOrIdentifierType(String text) {
         Integer type = KEYWORDS.get(text);
         if (type == null) {
             return MyLanguageParser.Identifier;
         }
    
         return type;
    }
    
  3. Add an Identifier lexer rule to your grammar that handles keywords and identifiers.

    Identifier
        :   [a-zA-Z_] [a-zA-Z0-9_]*
            {_type = MyLanguageKeywords.getKeywordOrIdentifierType(getText());}
        ;