ANTLR extraneous input when using words already defined

Question

I'm facing a problem with what looks like simple grammar:

grammar Test;

init :
    init separator init
    | word;

word :
    ( LETTER )+ ;

separator :
    SPACE OPERATOR SPACE
    | SPACE ;

SPACE : ' '+ ;
LETTER : 'A'..'Z' ;
OPERATOR : 'AND' | 'OR' ;

WS : [\t\r\n]+ -> skip ; // skip spaces, tabs, newlines

If I input the string AOR OR B what I get is an line 1:1 extraneous input 'OR' expecting {, SPACE, LETTER} but I don't understand why, because the word should match any capital letter until find a space char, isn't it?

The result what I expect is to catch the word AOR, the OR operator and the word B.

Can anyone give me some tips?, thank you in advance!

Bart Kiers Bart Kiers · Accepted Answer · 2015-01-16T19:08:39

In your case, the input AOR OR B gets tokenized as follows:

type=WORD, text=A
type=OR, text=OR
type=SPACE, text=
type=OR, text=OR
type=SPACE, text=
type=WORD, text=B

If you want AOR to be tokenized as a single word, you should make it a lexer rule instead of a parser rule:

WORD : 'A'..'Z'+ ;

ANTLR extraneous input when using words already defined

2 Answers