I have daily data of rainfall from 10 locations across 10 years
set.seed(123)
df <- data.frame(loc.id = rep(1:10, each = 10*365),years = rep(rep(2001:2010,each = 365),times = 10),
day = rep(rep(1:365,times = 10),times = 10), rain = runif(min = 0 , max = 35, 10*10*365))
I have a separate data frame that has certain days using which I want to sum the rainfall in df
df.ref <- data.frame(loc.id = rep(1:10, each = 10),
years = rep(2001:2010,times = 10),
index1 = rep(250,times = 10*10),
index2 = sample(260:270, size = 10*10,replace = T),
index3 = sample(280:290, size = 10*10,replace = T),
index4 = sample(291:300, size= 10*10,replace = T))
df.ref
loc.id years index1 index2 index3 index4
1: 1 2001 250 264 280 296
2: 1 2002 250 269 284 298
3: 1 2003 250 268 289 293
4: 1 2004 250 266 281 295
5: 1 2005 250 260 289 293
What I want to is for row in in df.ref
, use the index
values in df.ref
and
sum the rainfall in df
between index1 to index2, index1 to index3 and index1 to index4. For example:
Using df.ref
, for loc.id = 1 and year == 2001, sum the rainfall in df
from 250 to 264, 250 to 280, 250 to 296 (as shown in df.ref
)
Similarly, for year 2002, for loc.id = 1, sum the rainfall from 250 to 269, 250 to 284, 250 to 298.
I did this:
library(dplyr)
ptm <- proc.time()
dat <- df.ref %>% left_join(df)
index1.cal <- dat %>% group_by(loc.id,years) %>% filter(day >= index1 & day <= index2) %>% summarise(sum.rain1 = sum(rain))
index2.cal <- dat %>% group_by(loc.id,years) %>% filter(day >= index1 & day <= index3) %>% summarise(sum.rain2 = sum(rain))
index3.cal <- dat %>% group_by(loc.id,years) %>% filter(day >= index1 & day <= index4) %>% summarise(sum.rain3 = sum(rain))
all.index <- index1.cal %>% left_join(index2.cal) %>% left_join(index3.cal))
proc.time() - ptm
user system elapsed
2.36 0.64 3.06
I am looking to make my code faster since my actual df.ref
is quite large. Could anyone advise me how to make this quicker.
subset(df.ref, loc.id == 1 & years == 2001)
should it be like that? – pogibasdf
from 250 to 264, 250 to 280, 250 to 296" Do you mean sum the rainfall indf
from rows 250 to 264, 250 to 280, etc.? – Maurits Eversdata.table
– eddi