ANTLR treating multiple EOLs as one?

Question

I want to parse a language in which statements are separated by EOLs. I tried this in the lexer grammar (copied from an example in the docs):

EOL : ('\r'? '\n')+ ; // any number of consecutive linefeeds counts as a single EOL

and then used this in the parser grammar:

stmt_sequence : (stmt EOL)* ;

The parser rejected code with statements separated by one or more blank lines.

However, this was successful:

EOL : '\r'? '\n' ;

stmt_sequence : (stmt EOL+)* ;

I'm an ANTLR newbie. It seems like both should work. Is there something about greedy/nongreedy lexer scanning that I don't understand?

I tried this with both 3.2 and 3.4; I'm running the ANTLR IDE in Eclipse Indigo on OS X 10.6.

Thanks.

user1198411 user1198411 · Accepted Answer · 2012-02-09T21:17:01

The error was not in the original grammar; but in the input data. I was using an editor (in Eclipse) that automatically inserted tabs after an EOL, so my "blank lines" were not really blank.

I modified the grammar as follows:

fragment SPACE: ' ' | '\t';

EOL : ( '\r'? '\n' SPACE* )+;

This grammar works as expected.

The lesson here is that one must be careful with white spaces. The lexer may see white spaces in the input that the parser does not see (because it has already been sent to the hidden channel).

ANTLR treating multiple EOLs as one?

1 Answers