0
votes

If you have a DFA with m nodes and you run it on a string with n characters, at each step you have to not only test states inherited from previous step but also the first state of the DFA again. So after m character in the string (assuming m < n) worst case scenario is that you have mn actives states to test (each state need only one lookup to advance or be dismissed)

An example, consider a{l}b regular expression (all words starting with a repeated l times, follow by a b), its DFA have m = l + 1 nodes. Matching it against a string a{k}b with k>l means you will hit the worst case scenario of having (m - 1) active states after l characters in the string.

What did i miss ? Or does the literature hand wave the practical implementation to only concern itself with the theoretical question of knowing if a given full string (ie not one of it sub-string) match a regular expression.

From where i stand running an NFA or DFA will take O(nm) times (with m being number of node in NFA or DFA and n number of characters). Only thing is that NFA have more nodes than a DFA.

2

2 Answers

2
votes

Historically, DFAs were first defined to match entire strings rather than to search for substrings, which is why the literature typically talks about the time complexity of a DFA with regards to taking in a single string and then returning whether the whole string matches or not. If you have a DFA that matches an entire string and you want to use it to search for substrings, then you're essentially running the DFA multiple times, once for each possible start position, which is why you're getting O(mn) as your runtime rather than O(n).

However, if your goal was to match a substring somewhere, you're likely to be better off to redesign your DFA. Imagine for example that you want to match some regex R using a DFA. Rather than building a DFA for R and running it starting at each possible location, build a DFA for the regex Σ* R Σ* . Now, if any substring of the input matches R, the whole string matches Σ* R Σ *, so you only need to run a single pass of the DFA over the string. That drops the runtime down to O(n) since you're just running a single pass.

0
votes

If you really have a DFA you will not have multiple active states. A DFA is defined to have exactly one active state. And each character can only lead to exactly the next state.

If you take this property, you start from the start state and consume n characters. At each character you check: - if there is no transition to a non error state => mismatch - if there is a transition to a non error state => proceed

At the end check if your current state is a final state. If so => success, else => mismatch.

From my point of view a NFA takes O(n*m) where a DFA takes O(n). The DFA performance is not dependent on the pattern complexity (count of node).

Yet I do not know why you accepted an answer that refers to string search (which is indeed not O(n)) with a DFA instead of string matching with a DFA. But if this is your problem: There are algorithms that are derived from a DFA that do the job better than searching for Σ, this would be Knuth-Morris-Pratt (for single patterns) and Aho-Corasick (multiple patterns). The underlying DFA is compressed but both algorithms share the property that they do exactly one transition for one character not having multiple states at any time (like in an NFA).