4
votes

I need to replace the value inside a capture group of a regular expression with some arbitrary value; I've had a look at the re.sub, but it seems to be working in a different way.

I have a string like this one :

s = 'monthday=1, month=5, year=2018'

and I have a regex matching it with captured groups like the following :

regex = re.compile('monthday=(?P<d>\d{1,2}), month=(?P<m>\d{1,2}), year=(?P<Y>20\d{2})')

now I want to replace the group named d with aaa, the group named m with bbb and group named Y with ccc, like in the following example :

'monthday=aaa, month=bbb, year=ccc'

basically I want to keep all the non matching string and substitute the matching group with some arbitrary value.

Is there a way to achieve the desired result ?

Note

This is just an example, I could have other input regexs with different structure, but same name capturing groups ...

Update

Since it seems like most of the people are focusing on the sample data, I add another sample, let's say that I have this other input data and regex :

input = '2018-12-12'
regex = '((?P<Y>20\d{2})-(?P<m>[0-1]?\d)-(?P<d>\d{2}))'

as you can see I still have the same number of capturing groups(3) and they are named the same way, but the structure is totally different... What I need though is as before replacing the capturing group with some arbitrary text :

'ccc-bbb-aaa'

replace capture group named Y with ccc, the capture group named m with bbb and the capture group named d with aaa.

In the case, regexes are not the best tool for the job, I'm open to some other proposal that achieve my goal.

4
regex.sub('monthday=aaa, month=bbb, year=ccc', s)Aran-Fey
@Rawing wth your solution I need to hardcode the new result, but it is not what I'm asking for ... I want to replace the matching group with some arbitrary value . This is just an example, I could have other input regex with different structure, but same name capturing groups ...aleroot
@Rawing read the first line of the question : "I need to replace the value inside a capture group of a regular expression with some arbitrary value", this is not what your solution is actually doing ...aleroot
@Rawing the input regex and the input text could change, what is fixed is the name of the capturing groups that I need to replace with some other data, if you want I could add another dozen sample data with different structure but same number and naming of the capturing groups ...aleroot
@RomanPerekhrest I have updated the question to make it clearer.aleroot

4 Answers

7
votes

This is a completely backwards use of regex. The point of capture groups is to hold text you want to keep, not text you want to replace.

Since you've written your regex the wrong way, you have to do most of the substitution operation manually:

"""
Replaces the text captured by named groups.
"""
def replace_groups(pattern, string, replacements):
    pattern = re.compile(pattern)
    # create a dict of {group_index: group_name} for use later
    groupnames = {index: name for name, index in pattern.groupindex.items()}

    def repl(match):
        # we have to split the matched text into chunks we want to keep and
        # chunks we want to replace
        # captured text will be replaced. uncaptured text will be kept.
        text = match.group()
        chunks = []
        lastindex = 0
        for i in range(1, pattern.groups+1):
            groupname = groupnames.get(i)
            if groupname not in replacements:
                continue

            # keep the text between this match and the last
            chunks.append(text[lastindex:match.start(i)])
            # then instead of the captured text, insert the replacement text for this group
            chunks.append(replacements[groupname])
            lastindex = match.end(i)
        chunks.append(text[lastindex:])
        # join all the junks to obtain the final string with replacements
        return ''.join(chunks)

    # for each occurence call our custom replacement function
    return re.sub(pattern, repl, string)
>>> replace_groups(pattern, s, {'d': 'aaa', 'm': 'bbb', 'Y': 'ccc'})
'monthday=aaa, month=bbb, year=ccc'
2
votes

You can use string formatting with a regex substitution:

import re
s = 'monthday=1, month=5, year=2018'
s = re.sub('(?<=\=)\d+', '{}', s).format(*['aaa', 'bbb', 'ccc'])

Output:

'monthday=aaa, month=bbb, year=ccc'

Edit: given an arbitrary input string and regex, you can use formatting like so:

input = '2018-12-12'
regex = '((?P<Y>20\d{2})-(?P<m>[0-1]?\d)-(?P<d>\d{2}))'
new_s = re.sub(regex, '{}', input).format(*["aaa", "bbb", "ccc"])
2
votes

Extended Python 3.x solution on extended example (re.sub() with replacement function):

import re

d = {'d':'aaa', 'm':'bbb', 'Y':'ccc'}  # predefined dict of replace words
pat = re.compile('(monthday=)(?P<d>\d{1,2})|(month=)(?P<m>\d{1,2})|(year=)(?P<Y>20\d{2})')

def repl(m):
    pair = next(t for t in m.groupdict().items() if t[1])
    k = next(filter(None, m.groups()))  # preceding `key` for currently replaced sequence (i.e. 'monthday=' or 'month=' or 'year=')
    return k + d.get(pair[0], '')

s = 'Data: year=2018, monthday=1, month=5, some other text'
result = pat.sub(repl, s)

print(result)

The output:

Data: year=ccc, monthday=aaa, month=bbb, some other text

For Python 2.7 : change the line k = next(filter(None, m.groups())) to:

k = filter(None, m.groups())[0]
0
votes

I suggest you use a loop

import re
regex = re.compile('monthday=(?P<d>\d{1,2}), month=(?P<m>\d{1,2}), year=(?P<Y>20\d{2})')
s = 'monthday=1, month=1, year=2017   \n'
s+= 'monthday=2, month=2, year=2019'


regex_as_str =  'monthday={d}, month={m}, year={Y}'
matches = [match.groupdict() for match in regex.finditer(s)]
for match in matches:
    s = s.replace(
        regex_as_str.format(**match),
        regex_as_str.format(**{'d': 'aaa', 'm': 'bbb', 'Y': 'ccc'})
    )    

You can do this multile times wiht your different regex patterns

Or you can join ("or") both patterns together