1
votes

I wonder how to substitute group 1 with certain string by regex in python.

Question1:

str = "aaa bbb ccc"
regex = "\baaa (bbb)\b"
repl = "111 bbb 222"

Use regex to match str, matched "aaa bbb", and replace group1 "bbb" with "111 bbb 222", and get the result "aaa 111 bbb 222 ccc"

str_repl = "aaa 111 bbb 222 ccc"

Thanks for @RomanPerekhrest and @janos 's lookbehind method.

And I wonder how to solve a more general scenario:

Question2:

s1 = "bBb"
regex = "(?<=\baaa )" + s1 + "\b"  # may not suitable
repl = "XxX " + s1 + " YyY"

target:

s0 = "aaa bBb ccc"
s0_repl = "aaa XxX bBb YyY ccc"

s1 = "aaa bbb ccc"
no match

s2 = "AAA bBb ccc"
s2_repl = "AAA XxX bBb YyY ccc"

Ignore the case for substring except of s1 when matching in original string.

Question3:

s1 = "bbb"
regex = "(?<=\baaa )" + s1 + "\b"  # may not suitable
repl = "XxX " + s1 + " YyY"

target:

s0 = "aaa bBb ccc"
s0_repl = "aaa XxX bBb YyY ccc"

s1 = "aaa bbb ccc"
s1_repl = "aaa XxX bbb YyY ccc"

s2 = "AAA BBB ccc"
s2_repl = "AAA XxX BBB YyY ccc"

Ignore the case for substring except of s1 when matching & substituting in original string.

Question4:

If there is a way to substitute group 1 on original string by regex on python?

3
your general scenario sounds simple and complex simultaneously. Can you elaborate to make it more clear?RomanPerekhrest
Sorry for deleted some descriptions just now. Please check the question's details again.VikoTse
@Wiktor, thanks for your demo, I added question 2-4 above.VikoTse

3 Answers

1
votes

To replace sequence bbb which should be preceded by sequence aaa use the following approach:

s = "aaa bbb ccc"
regex = r"(?<=aaa )bbb\b"
repl = "111 bbb 222"

str_replaced = re.sub(regex, repl, s)
print(str_replaced)

The output:

aaa 111 bbb 222 ccc

(?<=aaa ) - lookbehind positive assertion, ensures that "bbb" is preceded by "aaa "

http://www.regular-expressions.info/lookaround.html

1
votes

You can use the re package, and positive look-behind:

import re
s = "aaa bbb ccc"
regex = r"\b(?<=aaa )(bbb)\b"
repl = "111 bbb 222"
print(re.sub(regex, repl, s))

This will produce:

aaa 111 bbb 222 ccc

Notice the changes I did there:

  • The aaa prefix in the regex is wrapped in (?<=...). This means, match bbb if it follows aaa, without including aaa in the pattern to replace. This is called positive lookbehind. Without this change to your regex, the aaa would disappear together with bbb
  • Regular expression strings should be written as r"...", to make them raw strings, in order to avoid problems with escape sequences
  • I renamed the str variable to s, because str is a reserved word in Python, as @elena also pointed out.
0
votes

First of all, don't use str as a variable name. It's a reserved keyword in Python.

import re

str1 = "aaa bbb ccc"
re.sub("bbb", "111 bbb 222", str1)
Out[11]: 'aaa 111 bbb 222 ccc'