
I'm trying to find matching string with my_list and data frame(df) and depending on TRUE/FALSE I need to populate new_name column in df with first sting in matching list (my_list[[i]][1]) in case TRUE , or "cat" column value in case no match.

My data frame is as follows:

name <- c("NETFLIX.COM", "BlueTV", "smv", "trafi", "alkatel")
cat<- c("none", "none", "none", "transportation", "communication")
df<-data.frame(name, cat)

My list:

travel<- c("travel","air_com", "AIRCAT", "tivago")
leasure<- c("leasure","MTV", "NETFLIX.COM")
my_list<- list(travel, leasure)

My for loop with ifelse and grepl is as follows:

for (j in 1:nrow(df)) {
      for (i in 1:length(my_list)) {
        df[j, "new_name"]<- ifelse( 
        grepl(paste(my_list[[i]], collapse="|"), tolower(df[j, "name"])),
          df[j, "cat"])

Expected output is :

df["new_name"]<- c("leasure", "none", "none", "transportation", "communication")

name            cat       new_name
1 NETFLIX.COM           none        leasure
2      BlueTV           none           none
3         smv           none           none
4       trafi transportation transportation
5     alkatel  communication  communication

Currently with the for loop I wrote I obtain exact copy of "cat" column meaning that all cases are considered as nonmatching (FALSE) in ifelse function. I'm note sure what's wrong here... Any help would be appreciated!

It doesn't make sense to use ifelse() in that loop. Use an if statement for flow control. ifelse() is used for a vectorized selection.user2554330

2 Answers


It doesn't make sense to use ifelse() in that context: it is for vectorized selection. But your code would work if you had the pattern matching right. Unfortunately, for j == 1 and i == 2 (when you expected a match), your pattern is


and you are trying to match it to tolower(df[j, "name"]), which is


You should map both strings to lowercase, or set ignore.case = TRUE in the grepl() call. For example,

name <- c("NETFLIX.COM", "BlueTV", "smv", "trafi", "alkatel")
cat<- c("none", "none", "none", "transportation", "communication")
df<-data.frame(name, cat)

travel<- c("travel","air_com", "AIRCAT", "tivago")
leasure<- c("leasure","MTV", "NETFLIX.COM")
my_list<- list(travel, leasure)

for (j in 1:nrow(df)) {
  for (i in 1:length(my_list)) {
    df[j, "new_name"] <- 
      if( grepl(paste(my_list[[i]], collapse="|"), df[j, "name"],
            ignore.case = TRUE))
      else df[j, "cat"]
#>          name            cat       new_name
#> 1 NETFLIX.COM           none        leasure
#> 2      BlueTV           none           none
#> 3         smv           none           none
#> 4       trafi transportation transportation
#> 5     alkatel  communication  communication

Created on 2021-08-10 by the reprex package (v2.0.0)

Generally speaking using pattern matching to find if a string is in a list is tricky; be really careful that your strings in my_list never include any characters that grepl() treats as special in a regular expression. For your example you'll get the same result as the grepl() gives using the test

tolower(df[j, "name"]) %in% tolower(my_list[[i]])

but that's not true for all possible name values: the grepl() code will allow partial matches (e.g. df[i, "name"] equal to "netflix.com in a long string") and %in% won't.


Here is one way using stringr::str_replace_all -

travel<- c("travel","air_com", "AIRCAT", "tivago")
leasure<- c("leasure","MTV", "NETFLIX.COM")
#Create a named list
my_list<- dplyr::lst(travel, leasure)

result <- stringr::str_replace_all(df$name, setNames(names(my_list), 
          sapply(my_list, paste0, collapse = '|')))

#If the result is same as original value keep the previous cat.
df$new_name <- ifelse(result == df$name, df$cat, result)

#         name            cat       new_name
#1 NETFLIX.COM           none        leasure
#2      BlueTV           none           none
#3         smv           none           none
#4       trafi transportation transportation
#5     alkatel  communication  communication

Here the important part is this code -

setNames(names(my_list), sapply(my_list, paste0, collapse = '|'))

#travel|air_com|AIRCAT|tivago      leasure|MTV|NETFLIX.COM 
#                    "travel"                    "leasure" 

This means that whenever the pattern travel|air_com|AIRCAT|tivago is encountered in the string it will return "travel" as output and same for "leasure".