Simple Island Grammar in ANTLR 4: Token Recognition Error

Question

apparently, I wasn't able to deduce the answers to my problem from exiting posts on token recognition errors with Island Grammars here, so I hope someone can give me an advice on how to do this correctly.

Basically, I am trying to write a language that contains proprocessor directives. I narrowed my problem down to a very simple example. In my example lanuage, the following should be valid syntax:

@@some preprocessor text
PRINT some regular text

When parsing the code, I want to be able to identify the tokens "some preprocessor text", "PRINT" and "some regular text".

This is the parser grammar:

parser grammar myp;

root: (preprocessor | command)*;
preprocessor: PREPROC PREPROCLINE;
command: PRINT STRINGLINE;

This is the lexer grammar:

lexer grammar myl;

PREPROC: '@@' -> pushMode(PREPROC_MODE);
PRINT: 'PRINT' -> pushMode(STRING_MODE);

WS: [ \t\r\n] -> skip;

mode PREPROC_MODE;

PREPROCLINE:    (~[\r\n])*[\r\n]+ -> popMode;

mode STRING_MODE;

STRINGLINE: (~[\r\n])*[\r\n]+ -> popMode;

When I parse the above example code, I get the following error:

line 1:2 extraneous input 'some preprocessor text\r\n' expecting PREPROCLINE line 2:5 token recognition error at: ' some regular text'

This error occurs regardless of whether the line "WS: [ \t\r\n] -> skip;" is included in the lexer grammar or not. I guess that if I introduced quotes to the tokens PREPROCLINE and STRINGLINE instead of the line endings, it would work (at least I suceesfully implemented regular strings in other languages). But in this particular language, I really want to have the strings without the quotes.

Any help on why this error is occurring or how to implement a preprocessor language with unquoted strings is very appreciated.

Thanks

GRosenberg GRosenberg · Accepted Answer · 2014-04-14T00:46:00

Updated: First, the recognition errors are because your parser needs to reference the lexer tokens. Add the options block to your parser:

options {
    tokenVocab=MyLexer;
}

Second, when you generate your lexer/parser, be aware that the warnings usually need to be considered and corrected before proceeding.

Finally, these are all working alternatives, once you add the options block.

XXXX: (~[\r\n])*[\r\n]+ -> popMode;

is a bit cleaner as:

XXXX: .*? '\r'? '\n' -> popMode;

To not include the line endings, try

XXXX: .*? ~[\r\n] -> popMode;

Simple Island Grammar in ANTLR 4: Token Recognition Error

1 Answers