Filter by combination of (row) pairs

Question

I have a dataframe in a long format and I want to filter pairs based on unique combinations of values. I have a dataset that looks like this:

id <- rep(1:4, each=2)
type <- c("blue", "blue", "red", "yellow", "blue", "red", "red", "yellow")
df <- data.frame(id,type)
df
  id   type
1  1   blue
2  1   blue
3  2    red
4  2 yellow
5  3   blue
6  3    red
7  4    red
8  4 yellow

Let's say each id is a respondent and type is a combination of treatments. Individual 1 saw two objects, both of them blue; individual 2 saw one red object and a yellow one; and so on.

How do I keep, for example, those that saw the combination "red" and "yellow"? If I filter by the combination "red" and "yellow" the resulting dataset should look like this:

  id   type
3  2    red
4  2 yellow
7  4    red
8  4 yellow

It should keep respondents number 2 and number 4 (only those that saw the combination "red" and "yellow"). Note that it does not keep respondent number 3 because she saw "blue" and "red" (instead of "red" and "yellow"). How do I do this?

One solution is to reshape the dataset into a wide format, filter it by column, and restack again. But I am sure there is another way to do it without reshaping the dataset. Any idea?

David David · Accepted Answer · 2018-01-12T11:50:44

A dplyr solution would be:

library(dplyr)
df <- data_frame(
  id = rep(1:4, each = 2),
  type = c("blue", "blue", "red", "yellow", "blue", "red", "red", "yellow")
)

types <- c("red", "yellow")

df %>% 
  group_by(id) %>% 
  filter(all(types %in% type))
#> # A tibble: 4 x 2
#> # Groups:   id [2]
#>      id   type
#>   <int>  <chr>
#> 1     2    red
#> 2     2 yellow
#> 3     4    red
#> 4     4 yellow

Update

Allowing for the equal combinations, e.g. blue, blue, we have to change the filter-call to the following:

types2 <- c("blue", "blue")

df %>% 
  group_by(id) %>% 
  filter(sum(types2 == type) == length(types2))
#> # A tibble: 2 x 2
#> # Groups:   id [1]
#>      id  type
#>   <int> <chr>
#> 1     1  blue
#> 2     1  blue

This solution also allows different types

df %>% 
  group_by(id) %>% 
  filter(sum(types == type) == length(types))
#> # A tibble: 4 x 2
#> # Groups:   id [2]
#>      id   type
#>   <int>  <chr>
#> 1     2    red
#> 2     2 yellow
#> 3     4    red
#> 4     4 yellow

Filter by combination of (row) pairs

4 Answers

Update