0
votes

I want to summarize a value based on an index and a grouping. This is my toy data set:

vals<- 1:5 
grps <- c(1,1,2,2,2)
dts <- as.Date(c("2020-01-01","2020-01-02","2020-01-03","2020-01-04","2020-01-05"))
dfx <- as_tibble(cbind(vals,grps,dts))
colnames(dfx) <- c("vals","grps","dts")
(rslt <- dfx %>% 
  mutate(dts = as.Date(dts)) %>%
  arrange(dts) %>%
  group_by(grps) %>%
  mutate(dist = as.numeric(last(dts) - dts)))

The resulting df looks like this:

   vals  grps dts         dist
  <dbl> <dbl> <date>     <dbl>
1     1     1 2020-01-01     1
2     2     1 2020-01-02     0
3     3     2 2020-01-03     2
4     4     2 2020-01-04     1
5     5     2 2020-01-05     0

I would like to get a new variable for each row (e.g. through mutate()) which is a summary of all days within each group (grps) for every row smaller or same as the value given in the index (dist). In other words, I would like to get the summaries of the values (vals) for each daily interval before a certain deadline, while incorporating the grouping.

So my desired final result would be a column in my df (sum):

# A tibble: 5 x 5
# Groups:   grps [2]
   vals  grps dts         dist   sum
  <dbl> <dbl> <date>     <dbl> <dbl>
1     1     1 2020-01-01     1     3
2     2     1 2020-01-02     0     2
3     3     2 2020-01-03     2    12
4     4     2 2020-01-04     1     9
5     5     2 2020-01-05     0     5
1

1 Answers

1
votes

You can arrange the data by dist and take cumsum for each grps.

library(dplyr)

rslt %>%
  arrange(grps, dist) %>%
  group_by(grps) %>%
  mutate(sum = cumsum(vals))

#   vals  grps dts         dist   sum
#  <dbl> <dbl> <date>     <dbl> <dbl>
#1     2     1 2020-01-02     0     2
#2     1     1 2020-01-01     1     3
#3     5     2 2020-01-05     0     5
#4     4     2 2020-01-04     1     9
#5     3     2 2020-01-03     2    12