Filter group only when both levels are present

Question

This feels like it should be more straightforward and I'm just missing something. The goal is to filter the data into a new df where both var values 1 & 2 are represented in the group

here's some toy data:

grp <- c(rep("A", 3), rep("B", 2), rep("C", 2), rep("D", 1), rep("E",2))

var <- c(1,1,2,1,1,2,1,2,2,2)

id <- c(1:10)

df <- as.data.frame(cbind(id, grp, var))

only grp A and C should be present in the new data because they are the only ones where var 1 & 2 are present.

I tried dplyr, but obviously '&' won't work since it's not row based and '|' just returns the same df:

df.new <- df %>% group_by(grp) %>% filter(var==1 & var==2) #returns no rows

Hello. If var can only be 1 or 2 you can: df %>% group_by(grp) %>% filter(n_distinct(var) == 2)... — Ika8

www www · Accepted Answer · 2018-11-26T16:50:08

Here is another dplyr method. This can work for more than two factor levels in var.

library(dplyr)

df2 <- df %>%
  group_by(grp) %>%
  filter(all(levels(var) %in% var)) %>%
  ungroup()
df2
# # A tibble: 5 x 3
#   id    grp   var  
#   <fct> <fct> <fct>
# 1 1     A     1    
# 2 2     A     1    
# 3 3     A     2    
# 4 6     C     2    
# 5 7     C     1

Filter group only when both levels are present

2 Answers