I am having some troubles in handling whitespace. In the following excerpt of a grammar, I set up the lexer so that the parser skips whitespace:
ENTITY_VAR
: 'user'
| 'resource'
;
INT : DIGIT+ | '-' DIGIT+ ;
ID : LETTER (LETTER | DIGIT | SPECIAL)* ;
ENTITY_ID : '__' ENTITY_VAR ('_w_' ID)?;
NEWLINE : '\r'? '\n';
WS : [ \t\r\n]+ -> skip; // skip spaces, tabs, newlines
fragment LETTER : [a-zA-Z];
fragment DIGIT : [0-9];
fragment SPECIAL : ('_' | '#' );
The problem is, I would like to match against variables names of the form ENTITY_ID
such that the matched string does not have any whitespace. It would be sufficient to write it as a lexer rule as I did here, but the thing is that I'd like to do it with a parser rule instead, because I want to have direct access to those two tokens ENTITY_VAR
and ID
individually from my code, and not squeeze them back together in a whole token ENTITY_ID
.
Any ideas, please?
Basically any solution which let me access directly ENTITY_VAR
and ID
would suit me, both by leaving ENTITY_ID
as a lexer rule or moving it to the parser.
'__'
, you switch modes where you don't skip spaces? – Bart KiersentityVar
such that when matching against'__'
it switches to a mode where theWS
lexer rule is disabled? – Riccardo T.entityId
, notVar
– Riccardo T.'__'
, it would switch modes, regardless if there is a parser rule that actually uses the'__'
token. – Bart Kiers