I'm using ANTLR4 to generate a Lexer for some JavaScript preprocessor (basically it tokenizes a javascript file and extracts every string literal).
I used a grammar originally made for Antlr3, and imported the relevant parts (only the lexer rules) for v4.
I have just one single issue remaining: I don't know how to handle corner cases for RegEx literals, like this:
log(Math.round(v * 100) / 100 + ' msec/sample');
The / 100 + ' msec/
is interpreted as a RegEx literal, because the lexer rule is always active.
What I would like is to incorporate this logic (C# code. I would need JavaScript, but simply I don't know how to adapt it):
/// <summary>
/// Indicates whether regular expression (yields true) or division expression recognition (false) in the lexer is enabled.
/// These are mutual exclusive and the decision which is active in the lexer is based on the previous on channel token.
/// When the previous token can be identified as a possible left operand for a division this results in false, otherwise true.
/// </summary>
private bool AreRegularExpressionsEnabled
if (Last == null)
return true;
switch (Last.Type)
// identifier
case Identifier:
// literals
case NULL:
case TRUE:
case FALSE:
case THIS:
case OctalIntegerLiteral:
case DecimalLiteral:
case HexIntegerLiteral:
case StringLiteral:
// member access ending
case RBRACK:
// function call or nested expression ending
case RPAREN:
return false;
// otherwise OK
return true;
This rule was present in the old grammar as an inline predicate, like this:
: { AreRegularExpressionsEnabled }?=> DIV RegularExpressionFirstChar RegularExpressionChar* DIV IdentifierPart*
But I don't know how to use this technique in ANTLR4.
In the ANTLR4 book, there are some suggestions about solving this kind of problems at the parser level (chapter 12.2 - context sensitive lexical problems), but I don't want to use a parser. I want just to extract all the tokens, leave everything untouched except for the string literals, and keep the parsing out of my way.
Any suggestion would be really appreciated, thanks!
delimiter char). – Lucas Trzesniewski