20
votes

There are times when being able to limit the pattern matching duration of regex operations could be useful. In particular, when working with user supplied patterns to match data, the pattern might exhibit poor performance due to nested quantifiers and excessive back-tracking (see catastrophic backtracking). One way to apply a timeout is to run the regex asynchronously, but this can be tedious and clutters the code.

According to what's new in the .NET Framework 4.5 Developer Preview it looks like there's a new built-in approach to support this:

Ability to limit how long the regular expression engine will attempt to resolve a regular expression before it times out.

How can I use this feature? Also, what do I need to be aware of when using it?

Note: I'm asking and answering this question since it's encouraged.

1
Sorry, but this is nothing but self promotion. I would think this is quite off topic for this site.Brandon Moretz
@Brandon you're quick to judge, despite the link in my note that supports this. It's on topic and common, as you can see here (answered by Jeff Atwood), here, here, and in the FAQ. Also, this is hardly self promotion as I succinctly cover everything about the topic here and believe it provides accurate info and is of value; I didn't withhold info and say "for more details please refer to my blog."Ahmad Mageed
@Ahmad I removed my down vote. I guess it's just matter of opinion, but I personally think this is pretty tasteless and all around a very poor way to promote yourself. However, to each his own. Cheers.Brandon Moretz
@BrandonMoretz (and Trevor): Maybe I am missing something, but I see no issue with somebody not only referring to their blog post on the same topic, but also saying "for more details, please refer to my blog" and then actually providing more information there. Provided, of course, that all of the information needed to solve the issue is contained in the answer here (such that the blog could go away and readers still have the full solution). And that was clearly done in this case. The fact is, there is a length limit on answers, and, being a Q & A site, the focus is not on academic deep-dives.Solomon Rutzky
@TrevorBoydSmith (and Brandon): (cont. from comment above) Actually, I see a benefit in people linking to more detailed posts that they wrote (again, as long as all info needed to solve the problem is here on S.O.). Depending on how one got to this page (S.O. search, a search engine, another blog / article / forum / etc), visiting the poster's blog can be quite helpful. I often follow links because that is a great way to learn, whether about the original topic or something else that sounds interesting, there or via links to yet other sites (e.g. the "Related" links to the right -->).Solomon Rutzky

1 Answers

24
votes

I recently researched this topic since it interested me and will cover the main points here. The relevant MSDN documentation is available here and you can check out the Regex class to see the new overloaded constructors and static methods. The code samples can be run with Visual Studio 11 Developer Preview.

The Regex class accepts a TimeSpan to specify the timeout duration. You can specify a timeout at a macro and micro level in your application, and they can be used together:

  • Set the "REGEX_DEFAULT_MATCH_TIMEOUT" property using the AppDomain.SetData method (macro application-wide scope)
  • Pass the matchTimeout parameter (micro localized scope)

When the AppDomain property is set, all Regex operations will use that value as the default timeout. To override the application-wide default you simply pass a matchTimeout value to the regex constructor or static method. If an AppDomain default isn't set, and matchTimeout isn't specified, then pattern matching will not timeout (i.e., original pre-.NET 4.5 behavior).

There are 2 main exceptions to handle:

  • RegexMatchTimeoutException: thrown when a timeout occurs.
  • ArgumentOutOfRangeException: thrown when "matchTimeout is negative or greater than approximately 24 days." In addition, a TimeSpan value of zero will cause this to be thrown.

Despite negative values not being allowed, there's one exception: a value of -1 ms is accepted. Internally the Regex class accepts -1 ms, which is the value of the Regex.InfiniteMatchTimeout field, to indicate that a match should not timeout (i.e., original pre-.NET 4.5 behavior).

Using the matchTimeout parameter

In the following example I'll demonstrate both valid and invalid timeout scenarios and how to handle them:

string input = "The quick brown fox jumps over the lazy dog.";
string pattern = @"([a-z ]+)*!";
var timeouts = new[]
{
    TimeSpan.FromSeconds(4),     // valid
    TimeSpan.FromSeconds(-10)    // invalid
};

foreach (var matchTimeout in timeouts)
{
    Console.WriteLine("Input: " + matchTimeout);
    try
    {
        bool result = Regex.IsMatch(input, pattern,
                                    RegexOptions.None, matchTimeout);
    }
    catch (RegexMatchTimeoutException ex)
    {
        Console.WriteLine("Match timed out!");
        Console.WriteLine("- Timeout interval specified: " + ex.MatchTimeout);
        Console.WriteLine("- Pattern: " + ex.Pattern);
        Console.WriteLine("- Input: " + ex.Input);
    }
    catch (ArgumentOutOfRangeException ex)
    {
        Console.WriteLine(ex.Message);
    }
    Console.WriteLine();
}

When using an instance of the Regex class you have access to the MatchTimeout property:

string input = "The English alphabet has 26 letters";
string pattern = @"\d+";
var matchTimeout = TimeSpan.FromMilliseconds(10);
var sw = Stopwatch.StartNew();
try
{
    var re = new Regex(pattern, RegexOptions.None, matchTimeout);
    bool result = re.IsMatch(input);
    sw.Stop();

    Console.WriteLine("Completed match in: " + sw.Elapsed);
    Console.WriteLine("MatchTimeout specified: " + re.MatchTimeout);
    Console.WriteLine("Matched with {0} to spare!",
                         re.MatchTimeout.Subtract(sw.Elapsed));
}
catch (RegexMatchTimeoutException ex)
{
    sw.Stop();
    Console.WriteLine(ex.Message);
}

Using the AppDomain property

The "REGEX_DEFAULT_MATCH_TIMEOUT" property is used set an application-wide default:

AppDomain.CurrentDomain.SetData("REGEX_DEFAULT_MATCH_TIMEOUT",
                                TimeSpan.FromSeconds(2));

If this property is set to an invalid TimeSpan value or an invalid object, a TypeInitializationException will be thrown when attempting to use a regex.

Example with a valid property value:

// AppDomain default set somewhere in your application
AppDomain.CurrentDomain.SetData("REGEX_DEFAULT_MATCH_TIMEOUT",
                                TimeSpan.FromSeconds(2));

// regex use elsewhere...
string input = "The quick brown fox jumps over the lazy dog.";
string pattern = @"([a-z ]+)*!";

var sw = Stopwatch.StartNew();
try
{
    // no timeout specified, defaults to AppDomain setting
    bool result = Regex.IsMatch(input, pattern);
    sw.Stop();
}
catch (RegexMatchTimeoutException ex)
{
    sw.Stop();
    Console.WriteLine("Match timed out!");
    Console.WriteLine("Applied Default: " + ex.MatchTimeout);
}
catch (ArgumentOutOfRangeException ex)
{
    sw.Stop();
}
catch (TypeInitializationException ex)
{
    sw.Stop();
    Console.WriteLine("TypeInitializationException: " + ex.Message);
    Console.WriteLine("InnerException: {0} - {1}",
        ex.InnerException.GetType().Name, ex.InnerException.Message);
}
Console.WriteLine("AppDomain Default: {0}",
    AppDomain.CurrentDomain.GetData("REGEX_DEFAULT_MATCH_TIMEOUT"));
Console.WriteLine("Stopwatch: " + sw.Elapsed);

Using the above example with an invalid (negative) value would cause the exception to be thrown. The code that handles it writes the following message to the console:

TypeInitializationException: The type initializer for 'System.Text.RegularExpressions.Regex' threw an exception.

InnerException: ArgumentOutOfRangeException - Specified argument was out of the range of valid values. Parameter name: AppDomain data 'REGEX_DEFAULT_MATCH_TIMEOUT' contains an invalid value or object for specifying a default matching timeout for System.Text.RegularExpressions.Regex.

In both examples the ArgumentOutOfRangeException isn't thrown. For completeness the code shows all the exceptions you can handle when working with the new .NET 4.5 Regex timeout feature.

Overriding AppDomain default

Overriding the AppDomain default is done by specifying a matchTimeout value. In the next example the match times out in 2 seconds instead of the default of 5 seconds.

AppDomain.CurrentDomain.SetData("REGEX_DEFAULT_MATCH_TIMEOUT",
                                TimeSpan.FromSeconds(5));

string input = "The quick brown fox jumps over the lazy dog.";
string pattern = @"([a-z ]+)*!";

var sw = Stopwatch.StartNew();
try
{
    var matchTimeout = TimeSpan.FromSeconds(2);
    bool result = Regex.IsMatch(input, pattern,
                                RegexOptions.None, matchTimeout);
    sw.Stop();
}
catch (RegexMatchTimeoutException ex)
{
    sw.Stop();
    Console.WriteLine("Match timed out!");
    Console.WriteLine("Applied Default: " + ex.MatchTimeout);
}

Console.WriteLine("AppDomain Default: {0}",
    AppDomain.CurrentDomain.GetData("REGEX_DEFAULT_MATCH_TIMEOUT"));
Console.WriteLine("Stopwatch: " + sw.Elapsed);

Closing Remarks

MSDN recommends setting a time-out value in all regular expression pattern-matching operations. However, they don't draw your attention to issues to be aware of when doing so. I don't recommend setting an AppDomain default and calling it a day. You need to know your input and know your patterns. If the input is large, or the pattern is complex, an appropriate timeout value should be used. This might also entail measuring your critically performing regex usages to assign sane defaults. Arbitrarily assigning a timeout value to a regex that used to work fine may cause it to break if the value isn't long enough. Measure existing usages before assigning a value if you think it might abort the matching attempt too early.

Moreover, this feature is useful when handling user supplied patterns. Yet, learning how to write proper patterns that perform well is important. Slapping a timeout on it to make up for a lack of knowledge in proper pattern construction isn't good practice.