Suppose I have the following df:
library(dplyr)
library(stringr)
input <- data.frame(
Id = c(1:6),
text = c("(714.4) (714) (714*)", "(714.33)", "(189) (1938.23)", "(714.93+) (714*)", "(719)", "(718.4)"))
And I would like to obtain the following output:
Output <- data.frame(
Id = c(1:6),
text = c("(714.4) (714) (714*)", "(714.33)", "(189) (1938.23)",
"(714.93+) (714*)", "(719) (299)", "(718.4)"),
first_match = c(1,0,0,0,1,0),
second_match = c(1,1,0,1,1,0))
This is, for the first column I want a one if (714)|(719)|(718) appear. For the second column I want a one if (714.33)|(714*)|(719) appear
In cases in which I want to evaluate if a pattern is in a string I use str_detect function from stringr package. However, in this case, with symbols such as [. + *] I am not obtaining the expected output.
I have tried the following code, which obviously failed:
attempt_1 <- input %>%
mutate(first_match = ifelse(str_detect(text, "(714)|(719)|(718)"), 1, 0),
second_match = ifelse(str_detect(text, "(714\\.33)|(714\\*)|(719)"), 1, 0))
attempt_2 <- input %>%
mutate(first_match = ifelse(str_detect(text, fixed("(714)|(719)")), 1, 0),
second_match = ifelse(str_detect(text, "(714\\.33)|(714\\*)"), 1, 0))
I tried to escape special symbols and also tried with exact match with the fixed parameter (I suppose it fails cause the | is not interpreted as an OR)
Any ideas?