0
votes

I'm using C# Regex class. I'm trying to split two strings from one. The source (input) string is constructed in following way:

first part must match PO|P|S|[1-5] (in regex syntax).

second part can be VP|GZ|GAR|PP|NAD|TER|NT|OT|LO (again, regex syntax). Second part can occur zero or one time.

Acceptable examples are "PO" (one group), "POGAR" (both groups PO+GAR), "POT" (P+OT)...

So I've use the following regex expression:

Regex r = new Regex("^(?<first>PO|P|S|[1-5])(?<second>VP|GZ|GAR|PP|NAD|TER|NT|OT|LO)?$");
Match match = r.Match(potentialToken);

When potentialToken is "PO", it returns 3 groups! How come? I am expecting just one group (first).

match.Groups are {"PO","PO",""}

Named groups are OK - match.Groups["first"] returns 1 instance, while match.Groups["second"].Success is false.

2
There seems to be a syntax error in your regular expression - there should probably be a < in front of second. Also, have you checked the Success values of the groups when retrieving match.Groups by index?O. R. Mapper
I've fixed the syntax as you noticed (thanks). Success values in match.Groups are {"PO",true;"PO",true;"",false}Milivoj Milani

2 Answers

1
votes

RegularExpression will always have one group which is "Group 0" at index 0 even though you don't have any capturing groups.

"Group 0" will be equal to whole match the regex has made(Match.Value).

Then in your case you get 3 groups because "Group 0" + "Group first" + "Group second". As mentioned "Group second" is an optional group so when it doesn't take part in subject .Net regex engine marks "Group second".Success = false. I don't see anything surprise here. This is the expected behavior.

1
votes

When using the numbered groups, the first group is always the complete matched (sub)string (cf. docs - "the first element of the GroupCollection object returned by the Groups property contains a string that matches the entire regular expression pattern"), i.e. in your case PO.

The second element in Groups is the capture of your first named group, and the third element is the capture of your second named group - just like the two captures you can retrieve by name. If you check Success of the numbered groups, you will see that the last element (the one matching your second named group) has a Success value of false, as well. You can interpret this as "the group exists, but it did not match anything".

To confirm this, have a look at the output of this testing code:

using System;
using System.Text.RegularExpressions;

class Program
{
    static void Main()
    {
        Regex r = new Regex("^(?<first>PO|P|S|[1-5])(?<second>VP|GZ|GAR|PP|NAD|TER|NT|OT|LO)?$");
        Match match = r.Match("PO");

        for (int i = 0; i < match.Groups.Count; i++) {
            Console.WriteLine(string.Format("{0}: {1}; {2}", i, match.Groups[i].Success, match.Groups[i].Value));
        }
    }
}

You can run it here.