2
votes

I'm trying to write a grammar for an XML-like language, where we use << instead of < characters. This is a partial snap of the lexer, where TEXT represents the text between (outside) tags:

OPEN  : '<<' ;
CLOSE : '>>' ;
TEXT  : ~[^<]+ ;

The definition for TEXT above is clearly wrong, because it will stop at the first occurrence of < even when one is not followed by another <. I am looking for a way to define "capture everything until you encounter a <<" but don't include the << as part of the match.

So something like this won't work either:

TEXT  : .*? '<<' ;

Is there a way to accomplish that in ANTLR4?

-- TR

1

1 Answers

4
votes

No need for a lookahead here, the following should do the trick:

TEXT  : ( ~'<' | '<' ~'<' )+ ;

That is: match a series of non < characters, or a single < followed by something else.

By the way, ANTLR's syntax is different for negative character classes. You should write ~[a-z] instead of [^a-z] for instance.

You may also want to take a look at the XML example grammar, it uses lexer modes to differentiate tokens inside tags, which may also prove useful for your grammar.