1
votes

I would like to extract rows from a data frame on the basis of values from two (or more) columns, where the columns contain the same character strings. For example, in the data frame below, I would like to extract all the rows that contain “jam” and “fish”.

This is sort of close, but it focuses on a single column rather than selecting columns on 2 or more columns.

df <- structure(list(num = 1:4, 
                     term_1 = c("jam", "bananna", "fish", 
                                           "carrot"), 
                     term_2 = c("fish", "jam", "apple", "halva"), 
                     term_3 = c("halva", "fish", "carrot", "fish")), 
                row.names = c(NA, -4L), class = c("tbl_df", "tbl", "data.frame"))

df

#>   num  term_1 term_2 term_3
#> 1   1     jam   fish  halva
#> 2   2 bananna    jam   fish
#> 3   3    fish  apple carrot
#> 4   4  carrot  halva   fish

I thought something like this might work, but it doesn't produce what I'm looking for.

df %>% 
  filter(term_1 == c("jam", "fish") | term_2 == c("jam", "fish") | 
           term_3 == c("jam", "fish"))

I’ve tried also tried some other filter() variants--along with if_all()--and str_select with filter(), but I'm missing some detail. Any direction will be appreciated.

Created on 2021-07-12 by the reprex package (v2.0.0)

Created on 2021-07-12 by the reprex package (v2.0.0)

3
It would be interesting to see the expected outputakrun
It would look someting like: 1 jam fish halva . I have no idea how to format this in the comments. Apologies.avgoustisw
What is your expected outputakrun
The expected output would be rows 1 and 2 in the data frame.avgoustisw

3 Answers

2
votes

We could use

library(dplyr)
df %>%
    filter(if_any(-num, ~ . %in% "jam") & 
       if_any(-num, ~ . %in% "fish"))
# A tibble: 2 x 4
    num term_1  term_2 term_3
  <int> <chr>   <chr>  <chr> 
1     1 jam     fish   halva 
2     2 bananna jam    fish  
1
votes

You can also use rowSums:

df%>%filter(rowSums(across(-num, ~.%in% c("fish", 'jam')))>=2)

# A tibble: 2 x 4
    num term_1  term_2 term_3
  <int> <chr>   <chr>  <chr> 
1     1 jam     fish   halva 
2     2 bananna jam    fish  
1
votes

You can use rowSums as -

df[rowSums(df[-1] == 'fish') > 0 & rowSums(df[-1] == 'jam') > 0, ]

#   num term_1  term_2 term_3
#  <int> <chr>   <chr>  <chr> 
#1     1 jam     fish   halva 
#2     2 bananna jam    fish