0
votes

I am attempting to use a lexer mode with ANTLR4 with the following lexer grammar:

STRING: '"' -> pushMode(STRING_MODE);
mode STRING_MODE;
STRING_CONTENTS: ~('"'|'\n'|'\r')+ -> type(STRING);
END_STRING: '"' -> type(STRING), popMode;
STRING_UNMATCHED: . -> type(UNMATCHED);
  • Is there a way to return a single token of type STRING for all the characters captured within the mode and including the characters which caused an entrance to the mode?
  • When does the mode end?

I am aware that I can also write the string token like so:

STRING: '"' (~["\n\r]|'\\"')* '"';
1

1 Answers

2
votes

1) The more attribute will accumulate the matched text into the first token emitted by a non-more attributed rule.

For:

STRING: '"' -> more, pushMode(STRING_MODE);

mode STRING_MODE;
    STRING_CONTENTS: ~('"'|'\n'|'\r')+ -> more ;
    END_STRING: '"' -> type(STRING), popMode;

the text matching the STRING and STRING_CONTENTS rules is prepended to that of the END_STRING rule, resulting in a STRING-typed token containing the full text of the string.

2) The 'end' of a mode statement is implied by the first subsequent encounter of

  • a parser rule
  • another mode statement
  • a fragment rule
  • EOF