I would like to know how to extract the values in one column of a data frame (data frame A) based on certain column names in data frame A containing the values of multiple columns from another data frame (data frame B).
More specifically. I have two data frames:
Data frame A contains combinations of birth defects. Each row is a different combination, and each column is the number of a defect included in that combination.
# Combinations data frame
combos <- data.frame("combo_no"=c(1:4),
"Defect_A" = c(1,1,1,1),
"Defect_B" = c(3,2,3,4),
"Defect_C" = c(4,4,NA,7),
"Defect_D" = c(5,5,NA,8),
"Defect_E" = c(6,6,NA,NA))
Data frame B contains individual cases. The first column has a unique identifier (CASE_ID). The rest of the columns are the numbers of specific birth defects, with “1” for “birth defect present” and “0” for “not present.”
# Cases data set
set.seed(99)
CASE_ID = c(1001:1005)
case1 = sample(0:1, 10, replace=TRUE)
case2 = sample(0:1, 10, replace=TRUE)
case3 = sample(0:1, 10, replace=TRUE)
case4 = sample(0:1, 10, replace=TRUE)
case5 = sample(0:1, 10, replace=TRUE)
def<-data.frame(rbind(case1, case2, case3, case4, case5))
colnames(def)<- c(1:10)
cases<-cbind(CASE_ID,def)
Desired Output: I would like to obtain the list of CASE_IDs from data frame A that have a combination of birth defects from data frame B. I’d also like to specify which combination is present. Ideally, the output would look as follows:
# Desired Output
output <- data.frame("CASE_ID" = c(1002,1003),
"combo_no" = c(3,1))
Thank you for your help.