4
votes

I need to use RegEx.Replace to replace only certain named groups in my input string.

So I might have a pattern like:

"^(?<NoReplace>.+)(?<FirstPeriod>(\d{2})|CM|RM|PM|CN|RN){1}(?<LastPeriod>(\d{2})|CM|RM|PM|CN|RN){1}((#(?<NumberFormat>[#,\.\+\-%0]+))*)$"

Tokens such as CM, RM are being replaced using Regex.Replace with a MatchEvaluator. However, this should only be replacing characters in the FirstPeriod and LastPeriod groups.

Example input: "FIELDCNS 01CM"

Desired output: "FIELDCNS 0104"

Incorrect output: "FIELD**04**S 0104"

Is this possible or am I best just pulling out the parts I want to replace and re-assembling afterwards?

5

5 Answers

5
votes

I'm not entirely sure I understand what you're asking, but if you're wanting to replace some strings only between parts you're matching with regular expressions then the trick is to capture all the bits you don't want to replace. For example, to replace all "blah"s with "XXXXX"s but only in between a "foo" and a "bar", you could do:

Dim regex As Regex = new Regex("(foo.*)blah(.*bar)")
Console.WriteLine(regex.Replace( _
    "blah foo bar baz blah baz bar blah blah foo blah", "$1XXXXX$2"))
Console.ReadLine()

blah foo bar baz XXXXX baz bar blah blah foo blah

1
votes

If you want to replace with more than one thing, you have to get more than one match. That means that your match string can only match the parts of the expression you want to replace, but you're trying to match them both at the same time. I think the missing piece here is lookbehind and lookahead.

(?<=.)(\d{2})(?=(\d{2}|CM|RM|PM|CN|RN)|(((#(?<NumberFormat>[#,\.\+\-%0]+))*)$))

This means "anything followed by two digits followed by (two digits or CM or RM...) OR (a number and the end of the input)" gets replaced. The lookahead (?=) and lookbehind (?<=) groups don't count as part of the match, so they don't get replaced.

This means that for a string like:

"FIELDCNS 01CM02CN"

You would get two calls to your MatchEvaluator, and you could get:

"FIELDCNS XXCMYYCN"

If you just want to replace all the "01" matches in the input with "04", then you don't need a MatchEvaluator at all.

1
votes

Instead of using Replace, I use String.Remove to remove the group string and insert the replacement string, just be careful if you are replacing many groups.

Public Function ReplaceGroup(ByVal regexp As Text.RegularExpressions.Regex, ByVal input As String, ByVal group As String, ByVal replacement As String) As String
    Dim match As Text.RegularExpressions.Match = regexp.Match(input)
    If Not match.Success Then Return input
    Dim group As Text.RegularExpressions.Group = match.Groups(group)
    If Not group.Success Then Return input
    Return input.Remove(group.Index, group.Length).Insert(group.Index, replacement)
End Function
0
votes

You could have something like this:

Dim evaluator as MatchEvaluator = AddressOf PeriodReplace
Regex.Replace("FIELDCNS 01CM", pattern, evaluator)

Public Function PeriodReplace(match As Match) As String
    Dim replaceTokens As New Regex("(CM|RM)")
    Dim replaceText As String = "04"
    Return match.Groups("NoReplace").Value & _
        replaceTokens.Replace(match.Groups("FirstPeriod").Value, replaceText) & _
        replaceTokens.Replace(match.Groups("LastPeriod").Value, replaceText) & _
        match.Groups("NumberFormat").Value
End Function
0
votes

I've also had this problem and I addressed it by creating some extension methods on the Match object to replace the value of the named group match value within the larger match value. In this example, I want to replace the value of the "id" group, without having to worry about the surrounding junk:

Dim contents = Regex.Replace(contents, "\|(?'id'\d+)\r\n", 
                      Function(m As Match)
                         Return m.ReplaceGroupValue("id", "[REPLACEMENT VALUE]")
                      End Function)

which uses:

<Extension()> _
Function ReplaceGroupValue(ByVal m As Match, ByVal sGroupName$, ByVal sNewValue$) As String
    'get the value of the specified group
    Dim value = m.Groups(sGroupName).Value

    Return m.Value.Replace(value, sNewValue)
End Function

If the replacement value is actually a more complicated function of the value to be replaced, it's more convenient to use this form:

Dim contents = Regex.Replace(contents, "\|(?'id'\d+)\r\n", 
                      Function(m As Match)
                         Return m.ReplaceGroupValue("id", Function(id) [do something with the id])
                      End Function)

<Extension()> _
Function ReplaceGroupValue(ByVal m As Match, ByVal sGroupName$, ByVal callback As Func(Of String, String)) As String
    'get the value of the specified group
    Dim value = m.Groups(sGroupName).Value

    Return m.Value.Replace(value, callback(value))
End Function

The ReplaceGroupValue function replaces the group value within the larger match expression, so you can concentrate on the named group that you want to work with.