I am struggling days now to find a way to match two different tables (unbalanced in row number) in R with the matching range of dates.
The first data frame represents what someone has declared and the second data.frame represents what actually is.The problem is that an id say A might have kept for a week or more a amount of product in kilos in his/her storage and then appear it one week after.
So i have two data frames with different rows. Both number for rows exceed 500.000.The first has 500.000 and the second 520.000.(so some NAs might occur but this doesn't matter).
I want to seach and match from these 2 different data frames:
- the ids to are the same,
- the product code and the man code to be the same, and
- to check if the date in the second data frame is in the range of a month from the date in the first data frame (for example he/she declares a weight (100 kg) in 2020/1/1 and appears it in 2020/1/15 but the actual weight is 105).
and if all these conditions are true to sum the two weights (from the data frames 1 and 2)
The desired output must be a data frame with the matched id, range of date, the product , the man, the 2 columns weights and the last column to be the values that will be the sum of the weights.
the example data frame are the two data frames in R below :
id = rep("A",10)
date = seq(as.Date("2000/1/1"), by = "day", length.out = 10)
product = c("E1","E1","E2","E3","E3","E4","E1","E1","E5","E5")
man = c("PS","PS","PS","DDR","DDR","FFR","PS","PS","BA","BA")
weight = c(1505,300,259,231,140,150,300,112,203,2200)
data1 = data.frame(id,date,product,man,weight);data1
id2 = rep("A",10)
date2 = sample(seq(as.Date("2000/1/10"), by = "day", length.out = 10),10)
product2 = c("E3","E3","E1","E1","E1","E1","E5","E4","E5","E2")
man2 = c("DDR","DDR","PS","PS","PS","PS","BA","FFR","BA","PS")
weight2 = c(259,295,260,240,135,145,250,110,2000,210)
data2 = data.frame(id2,date2,product2,man2,weight2);data2
How can I match them with the ginned date latency condition ?
Any help ?