1
votes

I'm trying to figure out how to use modes in my grammar and i'm confused about how to import lexer grammars with different modes into a combined grammar. It's hopefully something simple but i can't figure it out.

Basically i'm trying to create a grammar that will recognize a regexp string- so a string that starts with any non-whitespace character and then ends with a newline. Looking at how to use modes in the antlr4 book, i came up with this lexer grammar:

lexer grammar hlex;

REG : REGLIMIT -> more, mode(REG_MODE) ;
REGLIMIT : [ ]* ~[ \t\r\n] ;

mode REG_MODE ;

REGEND : [ ]+'\r'? '\n' ->mode(DEFAULT_MODE) ;
TEXT : . -> more ;

Now i'd like to import this into a combined grammar. I use the following combined grammar: (The prefix stuff is something every line needs to start with but isn't part of the regexp).

grammar h;
import  hlex ;

value : op=PREFIX REG ;
PREFIX : 'P:' 
       | 'DW:' ;

WS : [ \t\r\n]+ -> skip ;

This is where my problems begin.

I run:

java -classpath ./antlr-4.1-complete.jar org.antlr.v4.Tool h.g4

which says:

warning(125): h.g4:5:18: implicit definition of token 'REG' in parser

This confuses me - REG is defined in the import - so why is ANTLR having to create an implicit definition ?

And then when i try to compile *.java, it says:

hLexer.java:75: error: cannot find symbol
                case 1: more(); _mode = REG_MODE;  break;
                                        ^
  symbol:   variable REG_MODE
  location: class hLexer

I'm not sure what i'm missing here. It's probably something really simple but i can't figure it out.

There was another stack overflow question : Lexer modes from imported grammar is not identified in combined grammar. Compilation error after clicking 'run in testRig' Antlrworks2 which indicated that multi-mode lexer grammar imports are not being handled correctly.

But in that case i'm confused about how to use multiple modes at all - i tried to split the lexer into 2 grammar files:

file hlex1.g4:

lexer grammar hlex1;
import hlex2 ;
REG : REGLIMIT -> more, mode(REG_MODE) ;
REGLIMIT : [ ]* ~[ \t\r\n] ;

and file hlex2.g4

lexer grammar hlex2 ;
mode REG_MODE ;

REGEND : [ ]+'\r'? '\n' ->mode(DEFAULT_MODE) ;
TEXT : . -> more ;

But antlr4 complains at hlex2.g4 sayig it's surprised by unexpected "mode".

So i'm stumped. Any ideas what i'm missing ?

1

1 Answers

2
votes

Answering myself - self high five !

Reading the xml grammar example allowed me to get beyond the import issue. Turns out i can't import a lexer grammar with multiple modes into a combined parser as the previous stack overflow answer had mentioned - i need to mark my other grammar as a parser grammar and instead of using import, i have to say

options { tokenVocab=hlex }

There's still some stuff i don't understand - like how if i have a LEXER rule that refers to other lexer rules, i don't seem to be able to refer to it in the parser file but other rules are accessible.