3
votes

I have two data frames, x and y. The data frame x has a range of dates while data frame y has individual dates. I want to get the sum of the individual date values for the time ranges in data frame x. Thus id a would have the sum of all the values from 2019/1/1 through 2019/3/1.

id <- c("a","b","c")
start_date <- as.Date(c("2019/1/1", "2019/2/1", "2019/3/1"))
end_date <- as.Date(c("2019/3/1", "2019/4/1", "2019/5/1"))
x <- data.frame(id, start_date, end_date)

dates <- seq(as.Date("2019/1/1"),as.Date("2019/5/1"),1)
values <- runif(121, min=0, max=7)

y <- data.frame(dates, values)

Desired output

id start_date end_date  sum
a  2019/1/1   2019/3/1  221.8892
2

2 Answers

2
votes

One base R option is using apply

x$sum <- apply(x, 1, function(v) sum(subset(y,dates >= v["start_date"] & dates<=v["end_date"])$values))

such that

> x
  id start_date   end_date      sum
1  a 2019-01-01 2019-03-01 196.0311
2  b 2019-02-01 2019-04-01 185.6970
3  c 2019-03-01 2019-05-01 173.6429

Data

set.seed(1234)
id <- c("a","b","c")
start_date <- as.Date(c("2019/1/1", "2019/2/1", "2019/3/1"))
end_date <- as.Date(c("2019/3/1", "2019/4/1", "2019/5/1"))
x <- data.frame(id, start_date, end_date)

dates <- seq(as.Date("2019/1/1"),as.Date("2019/5/1"),1)
values <- runif(121, min=0, max=7)

y <- data.frame(dates, values)
1
votes

There are many ways of doing this. One possibility would be:

library(data.table)
x <- setDT(x)

# create a complete series for each id
x <- x[, .(dates = seq(start_date, end_date, 1)), by=id]


# merge the data
m <- merge(x, y, by="dates")

# get the sums
m[, .(sum = sum(values)), by=id]
   id      sum
1:  a 196.0311
2:  b 185.6970
3:  c 173.6429

You can add setseed before you create the random variables to exactly replicate the numbers

set.seed(1234)