ANTLR: Grammar with uppercase rules won't recognize input

Question

I'm quite new to ANTLR and I need an explanation about its behaviour when recognising input string. From what I understood, rules staring by an uppercase letter are lexer rules while the ones starting by lowercase letter are parser rules.
I have the following language and I need to write the grammar accordingly:

So L should accept all the strings composed by:

non empty string with b and/or c
d
non empty string with a and/or c

This is the first grammar I wrote and I don't have problems with it when recognising the input bbbcbcbcbcdaaacacacaaaa:

start   : (alfa 'd' beta);

alfa    : ('b'|'c') | ('b'|'c') alfa;
beta    : ('a'|'c') | ('a'|'c') beta;

WS      : (' '|'\n'|'\t'|'\r')->skip;

But if I change it like the following one, the same string is not recognised anymore (below you can see the debugger result of ANTLRWorks):

start   : (alfa 'd' beta);

alfa    : ALFA_VALS | ALFA_VALS alfa;
beta    : BETA_VALS | BETA_VALS beta;

ALFA_VALS: ('b'|'c');
BETA_VALS: ('a'|'c');
WS      : (' '|'\n'|'\t'|'\r')->skip;

Furthermore, if I change ALFA_VALS and BETA_VALS to alfa_vals and beta_vals respectively, no problems occur.

Could someone provide me an explanation about this behaviour? Because I couldn't find a specific one addressing this problem.
Thank you so much!

I recommend you use ANTLR v4 instead of the old v3 you're using now. — Bart Kiers

Jiri Tousek Jiri Tousek · Accepted Answer · 2018-06-05T10:30:54

The ANTLR lexer matches the longest possible sub-sequence of input, or if it can match the same input using multiple lexer rules, it uses the first rule that matches.

The lexer knows no context and no parser state, and decides based solely on the characters on input.

If you define ALFA_VALS and BETA_VALS in lexer this way, the input 'c' will always be matched as ALFA_VALS token.

ANTLR: Grammar with uppercase rules won't recognize input

2 Answers