Antlr hidden channel whitespace problem

Question

I have the following Antlr grammar:

grammar MyGrammar;

doc :   intro planet;
intro   :   'hi';
planet  :   'world';
MLCOMMENT 
    :   '/*' ( options {greedy=false;} : . )* '*/' { $channel = HIDDEN; };
WHITESPACE : ( 
    (' ' | '\t' | '\f')+
  |
    // handle newlines
    ( '\r\n'  // DOS/Windows
      | '\r'    // Macintosh
      | '\n'    // Unix
    )
    )
 { $channel = HIDDEN; };

In the ANTLRWorks 1.2.3 interpreter, the inputs hi world,hi/**/world and hi /*A*/ world work, as expected.

However, the input hiworld, which shouldn't work, is also accepted. How do I make hiworld fail? How do I force at least one whitespace(or comment) between "hi" and "world"?

Note that I've used only MLCOMMENT and WHITESPACE in this example to simplify, but other kinds of comments would be supported.

Well, I don't know Antlr, but wouldn't "doc: intro WHITESPACE planet" or something like this be most obvious? — schnaader
Because the channel WHITESPACE is hidden, that causes a MismatchedTokenException. — luiscubal
So can't you create another whitespace grammar that is not hidden and use it? — schnaader
I can, and I am temporarily using your method, but why would every single tutorial suggest either using the HIDDEN channel or skip() then? — luiscubal

Sam Harwell Sam Harwell · Accepted Answer · 2009-07-19T01:54:26

You need to create a general ID token. Since the lexer builds the longest token it can, it would see the input "hiworld" as a single word since it's longer than "hi" or "world" by themselves. Such a rule might look like:

ID : ('a'..'z' | 'A'..'Z')+;

As an example, that's exactly how parsers for programming languages separate the "do" keyword from "double" (keyword type, starts with 'do') or "done" (variable name).

Antlr hidden channel whitespace problem

2 Answers