0
votes

This feels like it should be more straightforward and I'm just missing something. The goal is to filter the data into a new df where both var values 1 & 2 are represented in the group

here's some toy data:

grp <- c(rep("A", 3), rep("B", 2), rep("C", 2), rep("D", 1), rep("E",2))

var <- c(1,1,2,1,1,2,1,2,2,2)

id <- c(1:10)

df <- as.data.frame(cbind(id, grp, var))

only grp A and C should be present in the new data because they are the only ones where var 1 & 2 are present.

I tried dplyr, but obviously '&' won't work since it's not row based and '|' just returns the same df:

df.new <- df %>% group_by(grp) %>% filter(var==1 & var==2) #returns no rows

2
Hello. If var can only be 1 or 2 you can: df %>% group_by(grp) %>% filter(n_distinct(var) == 2)... - Ika8

2 Answers

4
votes

Here is another dplyr method. This can work for more than two factor levels in var.

library(dplyr)

df2 <- df %>%
  group_by(grp) %>%
  filter(all(levels(var) %in% var)) %>%
  ungroup()
df2
# # A tibble: 5 x 3
#   id    grp   var  
#   <fct> <fct> <fct>
# 1 1     A     1    
# 2 2     A     1    
# 3 3     A     2    
# 4 6     C     2    
# 5 7     C     1 
1
votes

We can condition on there being at least one instance of var == 1 and at least one instance of var == 2 by doing the following:

library(tidyverse)
df1 <- data_frame(grp, var, id) # avoids coercion to character/factor

df1 %>%
    group_by(grp) %>%
    filter(sum(var == 1) > 0 & sum(var == 2) > 0)

  grp     var    id
  <chr> <dbl> <int>
1 A         1     1
2 A         1     2
3 A         2     3
4 C         2     6
5 C         1     7