1
votes

This is a theoretical Computer Science question (Computation Theory).

I know that RegExps can take a very long time to calculate. However, from Theory of Computation we know that matching with a Regular Expression can be done extremely fast in a few clock cycles.

If RegExps are equivalent to Finite Automata, why RegExps have (or require) a timeout method? Using a DFA, the computation time for matching can be exteremely fast.

By RegExps I mean the Regular Expressions pattern matching classes in major languages; JavaScript, C#, etc.

Are common RegExps ("regex"s) not equivalent to the Regular Expressions in Theory of Automata (i.e. Regular Languages)?

For examples see: How do I timeout Regex operations to prevent hanging in .NET 4.5? and Regex Pattern Catastrophic backtracking .

If Regexp's matching require Backtracking, it means they are not equivalent to Regular Expressions.

If the languages captured by "Regexp"s are not Regular Languages, historically why (out of which necessity) were they extended?

If it that the resulting DFA will require a huge set of states?

3
Normally matching is fast, but there are some cases with certain regular expressions and long input that can cause it to be very slow.Barmar
This is not a duplicate of that question. This is a theoretical question.Sohail Si
Then it doesn't belong on Stack Overflow, which is about practical programming problems. Theoretical questions belong on Computer Science. Also, the mathematics of regular languages belongs in Mathematics.Barmar
RegExp objects have more features than a DFA. All a DFA can do is accept/reject. But RegExp objects also report captures. Recovering captures requires more work.Raymond Chen
Recovering captures should not take that long one a match (accept/reject) is done.Sohail Si

3 Answers

0
votes

A good reason is catastrophic backtracking, which explains why matching of some regexes will not return before the heat death of the universe.

0
votes

Because regex are not equivalent to the Regular Expressions in Theory of Automata.

They are more like cousins with extra functionalities that make them more complex and sometimes (depending on the regex) impossible to execute on long strings.

-1
votes

(out of which necessity) were they extended?

Regexp implementations were extended in systems in which the lack of a regexp feature requires difficult workarounds, such as writing a substantial amount of code in an inexpressive programming language. There is also the grave risk that the code might turn out to be correct, performant and robust against false positive matches.