I would like to match multiple conditions from independent data tables onto my main data table.
How can I do this using the data.table
package?
What would be the most efficient/fastest way?
I have a mock example, with some mock conditions here to illustrate my question:
main_data <- data.frame( pnum = c(1,2,3,4,5,6,7,8,9,10),
age = c(24,35,43,34,55,24,36,43,34,54),
gender = c("f","m","f","f","m","f","m","f","f","m"))
data_1 <- data.frame( pnum = c(1,4,5,8,9),
value_data_1 = c(1, 2, 1, 1, 1),
date = as.Date(c("2019-01-01", "2018-07-01", "2018-01-01", "2016-07-01", "2016-07-01")))
data_2 <- data.frame( pnum = c(1,5,7,8,9),
value_data_2 = c(1, 2, 1, 1, 2),
date = as.Date(c("2019-01-01", "2018-07-01", "2018-01-01", "2016-07-01", "2016-07-01")))
I would like to create a new variable in my main_data table called "matching" of those rows that match between data_1 and data_2 under multiple conditions:
- First, the value of data_1$value_data_1 has to be equal to 1.
- Second, the value of data_2$value_data_2 also has to be equal to 1.
- Third, the pnum and the date should match between data_1 and data_2.
When all these conditions are met, I would expect the new output of main_data to look like this:
> main_data
pnum age gender matching
1 1 24 f 1
2 2 35 m 0
3 3 43 f 0
4 4 34 f 0
5 5 55 m 0
6 6 24 f 0
7 7 36 m 0
8 8 43 f 1
9 9 34 f 0
10 10 54 m 0
Until now, I programmed each condition seperately and created new placeholder tables in between, but this is not very memory efficient. Is there an efficient way to chain all the conditions using the data.tables
package specifically?