Formal language expressiveness of Perl patterns

Question

Classical regular expressions are equivalent to finite automata. Most current implementations of "regular expressions" are not strictly speaking regular expressions but are more powerful. Some people have started using the term "pattern" rather than "regular expression" to be more accurate.

What is the formal language classification of what can be described with a modern "regular expression" such as the patterns supported in Perl 5?

Update: By "Perl 5" I mean that pattern matching functionality implemented in Perl 5 and adopted by numerous other languages (C#, JavaScript, etc) and not anything specific to Perl. I don't want to consider, for example, tricks for embedding Perl code in a pattern.

Actually, "regex" is the preferred term for these mutant hybrids; "pattern" doesn't convey enough information. In Perl 6 they've been replaced with "Rules" (which can be assembled into "Grammars"), but "regex" is still accepted, too. — Alan Moore

P Shved P Shved · Accepted Answer · 2009-12-07T15:18:33

Perl regexps, as those of any pattern language, where "backreferences" are allowed, are not actually "regular".

Backreferences is the mechanism of matching the same string that was matched by a sub-pattern before. For example, /^(a*)\1$/ matches only strings with even number of as, because after some as there should follow the same number of those.

It's easy to prove, that, for instance, pattern /^((a|b)*)\1$/ matches words from a non-regular language(*), so it's more expressive that ant finite automaton. Regular expressions can't "remember" a string of arbitrary length and then match it again (the length may be very long, while finite-state machine only can simulate finite amount of "memory").

A formal proof would use the pumping lemma. (By the way, this language can't be described by context-free grammar as well.)

Let alone the tricks that allow to use perl code in perl regexps (non-regular language of balanced parentheses there).

(*) "Regular languages" are sets of words that are matched by finite automata. I already wrote an answer about that.

Formal language expressiveness of Perl patterns

3 Answers