1
votes

I'm trying to figure out the syntax of working with dplyr and I'm running into problems on how to pass on more than a single column to another function (e.g. str_detect). I want to search through a tibble and select all rows there a certain string is detected. I can run this for a specific column (e.g. col3 in the example below), but would like to look through some and/or all columns.

library(dplyr)
library(stringr)

col1 <- c("plate_ABC", "text", "text", "text")
col2 <- c("text", "this is plate B", "text", "text")
col3 <- c("text", "text", "C-plate", "text")

df <- as_tibble(data_frame(col1, col2, col3))

df %>% filter(str_detect(col3, "plate"))

Output:

df %>% filter(str_detect(col3, "plate"))

## A tibble: 1 x 3
#  col1  col2  col3   
#  <chr> <chr> <chr>  
#1 text  text  C-plate

Desired Output:

df %>% filter(str_detect(?SOME/ALL Cols?, "plate"))

## A tibble: 3 x 3
#  col1      col2            col3   
#  <chr>     <chr>           <chr>  
#1 plate_ABC text            text   
#2 text      this is plate B text   
#3 text      text            C-plate
2
Is your expected solution should not use filter or reduce? I am confusedakrun
No, sorry if I'm unclear. Any and all solutions are great. I'm very new to coding (only a couple of weeks in) so it takes me a while to understand the syntax and figure out the most convenient way to use the different commands. I think that in this case rowwise %>% filter is most intuitive for me to understand.Mario Niepel
The rowwise should be slow compared to the vectorized optionakrun
Thank you for pointing this out. I think right now speed is not an issuse (datatables are very small), but I will keep this in mind as I become more familiar with coding.Mario Niepel

2 Answers

2
votes

You can use across :

library(dplyr)
library(stringr)

df %>% filter(Reduce(`|`, across(.fns = ~str_detect(., "plate"))))

#  col1      col2            col3   
#  <chr>     <chr>           <chr>  
#1 plate_ABC text            text   
#2 text      this is plate B text   
#3 text      text            C-plate

Or rowwise :

df %>%
  rowwise() %>%
  filter(any(str_detect(c_across(), 'plate')))

If you have older version of dplyr (<1.0.0) you can use filter_all/filter_at :

df %>% filter_all(any_vars(str_detect(., 'plate')))
2
votes

We can use base R to do this

df[Reduce(`|`, lapply(df, grepl, pattern = 'plate')),]

-output

# A tibble: 3 x 3
#  col1      col2            col3   
#  <chr>     <chr>           <chr>  
#1 plate_ABC text            text   
#2 text      this is plate B text   
#3 text      text            C-plate

Or using rowSums

df[rowSums(`dim<-`(grepl('plate', as.matrix(df)), dim(df))) > 0,]

Or using tidyverse

library(dplyr)
library(purrr)
library(stringr)
df %>%
   filter(across(everything(), ~ str_detect(., 'plate')) %>% 
           reduce(`|`))
# A tibble: 3 x 3
#  col1      col2            col3   
#  <chr>     <chr>           <chr>  
#1 plate_ABC text            text   
#2 text      this is plate B text   
#3 text      text            C-plate

Benchmarks

Timings on a slightly bigger dataset

df1 <- df[rep(seq_len(nrow(df)), 1e6), ]
system.time(df1 %>% filter(Reduce(`|`, across(.fns = ~str_detect(., "plate")))))
#   user  system elapsed 
#  1.597   0.139   1.736 

system.time(df1 %>%
  rowwise() %>%
  filter(any(str_detect(c_across(), 'plate'))))
 #  user  system elapsed 
 #178.694   1.477 180.864 

 system.time(df1 %>% filter_all(any_vars(str_detect(., 'plate'))) )
 # user  system elapsed 
 # 1.461   0.061   1.499 

 system.time(df1[Reduce(`|`, lapply(df1, grepl, pattern = 'plate')),])
 #   user  system elapsed 
 #  2.792   0.025   2.778 

 system.time(df1 %>%
   filter(across(everything(), ~ str_detect(., 'plate')) %>% 
        reduce(`|`)))
#   user  system elapsed 
#  1.471   0.054   1.505