0
votes

I'm trying to write a lexer rule that would match following strings a aa aaa bbbb

the requirement here is all characters must be the same

I tried to use this rule: REPEAT_CHARS: ([a-z])(\1)*

But \1 is not valid in antlr4. is it possible to come up with a pattern for this?

1

1 Answers

1
votes

You can’t do that in an ANTLR lexer. At least, not without target specific code inside your grammar. And placing code in your grammar is something you should not do (it makes it hard to read, and the grammar is tied to that language). It is better to do those kind of checks/validations inside a listener or visitor.

Things like back-references and look-arounds are features that krept in regex-engines of programming languages. The regular expression syntax available in ANTLR (and all parser generators I know of) do not support those features, but are true regular languages.

Many features found in virtually all modern regular expression libraries provide an expressive power that far exceeds the regular languages. For example, many implementations allow grouping subexpressions with parentheses and recalling the value they match in the same expression (backreferences). This means that, among other things, a pattern can match strings of repeated words like "papa" or "WikiWiki", called squares in formal language theory.

-- https://en.wikipedia.org/wiki/Regular_expression#Patterns_for_non-regular_languages