3
votes

I have two datasets dat1 and dat2. I would like to pull out rows from dat1 which match the pairs of variables from dat2. var6 can be matched in any of var1, var2, var3, and var4. var7 must be matched with var5.

I would like to come up with a solution using the map functions from the purr package in tidyverse but I'm not sure where to start. Thank you for any help!

dat1 <- data.frame(id = c(1:9), 
                   var1 = c("x","x","x","y","y","y","z","z","z"),
                   var2 = c("c","c","c","d","d","d","e","e","e"),
                   var3 = c("f","f","f","g","g","g","h","h","h"),
                   var4 = c("i","i","i","j","j","j","k","k","k"),
                   var5 = c("aa","aa","aa","aa","aa","aa","bb","bb","bb"), stringsAsFactors = FALSE)

dat2 <- data.frame(var6 = c("c", "d", "l", "m", "n"),
                   var7 = c("aa", "bb", "aa", "aa","aa"), stringsAsFactors = FALSE)

In this example the result would pull out rows 1, 2, and 3 from dat1 as "c" is matched in var2 and "aa" is matched in var5.

1

1 Answers

2
votes

If we need an elementwise comparison, loop through the column 2 to 5 in 'dat1' with lapply, then do an elementwisse comparison with 'var6' of 'dat2' using outer while doing the same comparison with 'var5', 'var7' columns from 'dat1', 'dat2' respectively, check whether we get both as TRUE (&), then take the row wise sum (rowSums) to collapse the matrix into a single logical vector and Reduce the list. of vectors into. a single vector with | i.e. checking whether any of the row elements are TRUE in each of the vectors. It is used for subsetting the rows ('i1')

i1 <- Reduce(`|`, lapply(dat1[2:5], function(x) 
     rowSums(outer(x, dat2$var6, `==`) & outer(dat1$var5, dat2$var7, `==`)) > 0 ))
dat1[i1,]
#  id var1 var2 var3 var4 var5
#1  1    x    c    f    i   aa
#2  2    x    c    f    i   aa
#3  3    x    c    f    i   aa

Or using map

library(purrr)
library(dplyr)
map(dat1[2:5], ~ outer(.x, dat2$var6, `==`) &
                 outer(dat1$var5, dat2$var7, `==`)) %>%
   reduce(`+`) %>% 
   rowSums %>%
   as.logical %>%
   magrittr::extract(dat1, ., )
#  id var1 var2 var3 var4 var5
#1  1    x    c    f    i   aa
#2  2    x    c    f    i   aa
#3  3    x    c    f    i   aa