How do I keep part of the character string used in regular expressions for pattern matching when replacing?

Question

I'm using stringr to help manipulate some html code stored in a character vector, the code looks like:

foo <- 'text-align:left;"> 4: Forging Foundations </td>\n'

In my full html code I have variations of what strings appear in place of 4: Forging Foundations multiple times, and I need to use the entirety of this section of code as the pattern to replace. The final text output which I'm looking for is:

'text-align:left;background-color: #B0fff4 !important;"> 4: Forging Foundations </td>\n'

So I thought of using the . regexp with the * quantifier in place of 4: Forging Foundations:

foo <- str_replace_all(
  foo,
  'text-align:left;">.*(?=</td>\n)',
  'text-align:left;background-color: #B0fff4 !important">.*(?=</td>\n)'
)

However this ends up replacing part of my original string with the regular expression syntax I used - I'm looking for some way to maintain that part of the character vector untouched.

Do you realize you can't use a regex pattern in the replacement string? — Wiktor Stribiżew
Why use regex if you seem to replace hardcoded, fixed strings? Try sub('text-align:left;">', 'text-align:left;background-color: #B0fff4 !important;">', foo, fixed=TRUE) — Wiktor Stribiżew
There are other parts of my html code which feature 'text-align:left;"> but I don't want to replace those, only those which have a similar structure to foo. — Nautica
Then, gsub('text-align:left;">([^<]*</td>)', 'text-align:left;background-color: #B0fff4 !important;">\\1', foo)? If you need to replace all occurrences, gsub seems to be a good-enough base R function — Wiktor Stribiżew

Wiktor Stribiżew Wiktor Stribiżew · Accepted Answer · 2019-10-02T16:52:14

You may use

gsub('text-align:left;">([^<]*</td>)', 'text-align:left;background-color: #B0fff4 !important;">\\1', foo)
# => [1] "text-align:left;background-color: #B0fff4 !important;\"> 4: Forging Foundations </td>\n"

The ([^<]*</td>) part is a capturing group that matches any 0+ chars other than < and then </td> and then in the replacement pattern this part is restored using the $1 replacement backreference.

How do I keep part of the character string used in regular expressions for pattern matching when replacing?

1 Answers