The initial title question was: Why does my lexer rule not work, until I change it to a parser rule? The contents below are related to this question. Then I found new information and changed the title question. Please see my comment!
My Antlr Grammar (Only the "Spaces" rule and it's use is important):
grammar MyTest;
Space: ' ';
Tab: '\t';
Break: '\n';
Digit: [0-9];
Char: [A-Z\u00C4\u00D6\u00DCa-z\u00E4\u00F6\u00FC\u00DF];
Prefix: '"' | '\'' | '(' | '[';
Suffix: '\u00AF' | '\u002d' | '.' | ',' | ':' | ';' | '!' | '?' | '"' | '\'' | ')' | ']';
Special: [\u005e\u00ac\u2014\u201e\u2022/><ยง&{}#*~+\\];
Spaces: Space (Space Space?)?;
Sign: Prefix | Suffix | Special ;
LatinNumber
: 'I' ('I' 'I'?)?
| 'I'? 'V' ('I' ('I' 'I'?)?)?
| 'I'? 'X' ('I' ('I' 'I'?)?)? 'V'? ('I' ('I' 'I'?)?)? ;
YearNumber
: '(' '1' '9' Digit Digit ')'
| '[' '1' '9' Digit Digit ']'
| '1' '9' Digit Digit;
OtherNumber
: [1-9] Digit* ;
Numbers
: LatinNumber | YearNumber | OtherNumber;
NormalNumbers
: Prefix? Numbers Suffix?;
Word: Prefix? Char Char+ Suffix?;
line: Break Spaces? ((Word | NormalNumbers) Spaces?)+ ;
myTest: line ;
Example Input:
Something- and Somethingmore at location
Located Somewhere
Dallas, 2012
at. 99.2013(2014)
Some bla blub Text- and Content Examples from Wikipedia The Illinois Centennial half dollar is a commemorative fifty-cent piece struck by the United States Bureau of the Mint in 1918. The obverse side, depicting Abraham Lincoln, was designed by Chief Engraver George T. Morgan; the reverse image, based on the Seal of Illinois, was done by his assistant and successor, John R. Sinnock.
https://en.wikipedia.org/wiki/Illinois_Centennial_half_dollar
Console Output
line 2:10 extraneous input ' ' expecting {<EOF>, NormalNumbers, Word}
ParseTree:
(myTest (line \n Something- and))
Improved ParseTree:
'- myTest
|- TOKEN[type: 3, text: \n]
|- TOKEN[type: 16, text: Something-]
|- TOKEN[type: 1, text: ]
'- TOKEN[type: 16, text: and]
So the output states there is a problem right after the first "Something-" of my input where the whitespace is coming - in my grammar just called Space. Because my input comes from an ocr source there can be multiple whitespaces, but on the other hand i need to recognize the spaces, because they have meaning for the text structure. For this reason in my grammar I defined
Spaces: Space (Space Space?)?;
but this throws the error above - the whitespace is not recognzied. So when I replace it with a parser rule (lowercase!) in my grammar
spaces: Space (Space Space?)?;
and also here
line: Break spaces? ((Word | NormalNumbers) spaces?)+ ;
the error seems to be solved (subsequent errors appear - not part of this question).
So why is the error solved then in this concrete case when using a parser rule instead of a lexer rule? And in general - when to use a lexer rule and when a parser rule?
Thank you, guys!