ANTLR - string literals in parser rules overrides other rules

Question

I've defined some hex_byte rule which should match two hexadecimal ([a-fA-F0-9]) characters. I use that in several of the rules of my grammar.

hungry.g

grammar hungry;

expr: message NEWLINE;

message
    :   hex_byte specificMessage
    ;

hex_byte 
    :   a=HEX_BYTE 
    ;

specificMessage
    :   '05' lunchRequest
    |   '06' dinnerRequest
    |   '07' brunchRequest
    ;

lunchRequest  : hex_byte*;
dinnerRequest : hex_byte*;
brunchRequest : hex_byte*;



HEX_DIGIT 
    :   '0'|'1'|'2'|'3'|'4'|'5'|'6'|'7'|'8'|'9'|'a'|'b'|'c'|'d'|'e'|'f'|'A'|'B'|'C'|'D'|'E'|'F'
    ;

HEX_BYTE
    :   HEX_DIGIT HEX_DIGIT
    ;

NEWLINE : [\r\n]+;

Input that contains a hex_byte sequence which isn't being used as a string literal in any other parser rules (e.g. FF, 78, 12, etc.) works fine. However, when I introduce input which contains a hex byte which is being used as a string literal in the specificMessage rule (05, 06, 07), then the parsing fails. Why does this failure occur?

Here are a couple examples of parsing input for the expr rule:

780612 produces

successful_parse

0506BB complains:

line 1:0 missing HEX_BYTE at '05'

line 1:2 extraneous input '06' expecting {HEX_BYTE, NEWLINE}

and produces

enter image description here

Sam Harwell Sam Harwell · Accepted Answer · 2014-02-03T22:52:09

In ANTLR, a single token has exactly one token type. By using a string literal in a parser rule, you have implicitly defined a token type (anonymous tokens in this case, since no lexer rule matches the specific literals).

You can correct this situation by using semantic predicates instead of introducing the new token types:

specificMessage
  : {"05".equals(_input.LT(1).getText())}? HEX_BYTE lunchRequest
  | {"06".equals(_input.LT(1).getText())}? HEX_BYTE dinnerRequest
  | {"07".equals(_input.LT(1).getText())}? HEX_BYTE brunchRequest
  ;

ANTLR - string literals in parser rules overrides other rules

1 Answers