I'm trying to construct an antlr grammar to parse a templating language. that language can be embedded in any text and the boundaries are marked with opening/closing tags: {{
/ }}
. So a valid template looks like this:
foo {{ someVariable }} bar
Where foo
and bar
should be ignored, and the part inside the {{
and }}
tags should be parsed. I've found this question which basically has an answer for the problem, except that the tags are only one {
and }
. I've tried to modify the grammar to match 2 opening/closing characters, but as soon as i do this, the BUFFER
rule consumes ALL characters, also the opening and closing brackets. The LD
rule is never being invoked.
Has anyone an idea why the antlr lexer is consuming all tokens in the Buffer
rule when the delimiters have 2 characters, but does not consume the delimiters when they have only one character?
grammar Test;
options {
output=AST;
ASTLabelType=CommonTree;
}
@lexer::members {
private boolean insideTag = false;
}
start
: (tag | BUFFER )*
;
tag
: LD IDENT^ RD
;
LD @after {
// flip lexer the state
insideTag=true;
System.err.println("FLIPPING TAG");
} : '{{';
RD @after {
// flip the state back
insideTag=false;
} : '}}';
SPACE : (' ' | '\t' | '\r' | '\n') {$channel=HIDDEN;};
IDENT : (LETTER)*;
BUFFER : { !insideTag }?=> ~(LD | RD)+;
fragment LETTER : ('a'..'z' | 'A'..'Z');
IDENT : (LETTER)*;
(might) cause the lexer to go in an infinite loop. Lexer rule must always match at least 1 character. – Bart Kiers