1
votes

I've stuck with regex problem using .NET. For example, I have next regex pattern: (?'group1'A|C)|(?'group2'B|C)|(?'group3'A|B|C)

When I do match of "AXYZ" I receive match object which contains Value and Groups; if I go to Groups I'll see that only one group has success in true - group1 (group3 is in false). If I do match of "BXYZ" I'll receive only group2 with success in true (group3 is in false).

How could I receive in match not only one group but all groups satisfying the match?

For example above it should be: group1 & group3 in "AXYZ" and group2 & group3 in "BXYZ".

All above is only example in real system there are different patterns (3+ letters each) and more complicated input text (1000+ words).

4
Do you mean group1 & group2 & group3 for "BXYZ"? bc they all match B right?mikey
I mean that for "BXYZ" should match group2 and group3, but in fact I receive only group2 (FYI if change order 2<->3 in regex pattern then I'll receive group3, it seems that it just takes first group).Alex M
You could make them separate regular expressions and apply them one at a time.mbeckish

4 Answers

2
votes

The question seems a little abstract, but if you insist on a single regex you can do something like this, using optional lookaheads:

(?=(?'group1'A|C)?)(?=(?'group2'B|C)?)(?=(?'group3'A|B|C)?)

Lookaheads match but don't capture, so your match will be empty in this case, but the groups will be as you expect, and may overlap.

Working example: http://ideone.com/PTtQu

1
votes

The regex you have there will only match single characters; as soon as a match has been found on a character, the regex moves onto the next character in the input string. In your example, 'B' will never be matched by 'group2' or 'group3' as it will always be matched by 'group1'. Similarly, 'A' will never be matched by 'group3' for the same reason.

One way of getting the outcome you require using regexes is to treat each group as a separate regex and use Regex.IsMatch() on each one. For counts, the following C# does what I think you're asking for:

string input = "AXYZ";
int count = 0;

count += Regex.IsMatch(input, "A|B") ? 1 : 0;
count += Regex.IsMatch(input, "B|C") ? 1 : 0;
count += Regex.IsMatch(input, "A|B|D") ? 1 : 0;

Console.WriteLine(count); // returns 2
0
votes

I believe you have to make the regex "greedy". Here is some info on it:

http://blogs.msdn.com/b/ericgu/archive/2005/08/19/453869.aspx

0
votes

The regex engine is eager, which means it will always return the left-most match and stop matching once a match is found. To demonstrate, consider this sample:

string input = "Hello World";
string pattern = "Hello|Hello World";
Console.WriteLine(Regex.Match(input, pattern).Value);
pattern = "Hello World|Hello";
Console.WriteLine(Regex.Match(input, pattern).Value);

In your case group1 is matched first, so all other groups will not match and return false. Also, you claim that "BXYZ" returns group2, but this can't be right. Both "AXYZ" and "BXYZ" get matched by group1: (?'group1'A|B). If you have a need to test each group you'll need to do so using a separate regex.