0
votes

I am currently working R in data.table and am looking for an easy way to implement a rolling lag sum. I can find posts on lags and posts on various sum functions but haven't been successful finding one in which sum and lag are combined in the way I am looking to implement it (rolling back 3 days).

I have a data set that resembles the following-

id  agedays  diar
1    1        1
1    2        0
1    3        1
1    4        1
1    5        0
1    6        0
1    7        0
1    8        1
1    9        1
1    10       1
3    2        0
3    5        0
3    6        0
3    8        1
3    9        1
4    1        0
4    4        0
4    5        0
4    6        1
4    7        0

I want to create a variable "diar_prev3" that holds the rolling sum of diar for the past 3 days prior to the current agedays value. Diar_prev3 would be NA for the rows in which agedays < 4 The data set would look like the following :

id  agedays  diar  diar_prev3
1    1        1      NA
1    2        0      NA
1    3        1      NA
1    4        1      2
1    5        0      2
1   6        0      2
1    7        0      1
1    8        1      0
1    9        1      1
1    10       1      2
3    2        0      NA
3    5        0      0
3    6        0      0
3    8        1      0
3    9        1      1
4    1        0      NA
4    4        0      0
4    5        0      0
4    6        1      0
4    7        0      1

I have tried a basic lag function, but am unsure how to implement this with a rolling sum function included. Does anyone have any functions they recommend using to accomplish this?

****Edited to fix an error with ID==2

1
Try zoo::rollsumAllan Cameron
Don't you want to do this by id ? How is value at row 6 2?Ronak Shah

1 Answers

3
votes

I don't get the logic; it does not appear to be by id, otherwise the results for id==2 don't make sense - but what is going on with id==3 and 4?

In principle, you could do something like this - either by ID or not:

library(data.table)
library(RcppRoll)
DT <- structure(list(id = c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 
                            3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L), 
                     agedays = c(1L, 2L, 
                                 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 2L, 5L, 6L, 8L, 9L, 1L, 4L, 
                                 5L, 6L, 7L), diar = c(1L, 0L, 1L, 1L, 0L, 0L, 0L, 1L, 1L, 1L, 
                                                       0L, 0L, 0L, 1L, 1L, 0L, 0L, 0L, 1L, 0L)), 
                class = "data.frame", row.names = c(NA, -20L))

setDT(DT)
DT[, diar_prev3 := ifelse(agedays < 4, NA, RcppRoll::roll_sum(lag(diar, 1), n=3L, fill=NA, align = "right"))][]
#>     id agedays diar diar_prev3
#>  1:  1       1    1         NA
#>  2:  1       2    0         NA
#>  3:  1       3    1         NA
#>  4:  1       4    1          2
#>  5:  1       5    0          2
#>  6:  2       6    0          1
#>  7:  2       7    0          0
#>  8:  2       8    1          1
#>  9:  2       9    1          2
#> 10:  2      10    1          3
#> 11:  3       2    0         NA
#> 12:  3       5    0          1
#> 13:  3       6    0          0
#> 14:  3       8    1          1
#> 15:  3       9    1          2
#> 16:  4       1    0         NA
#> 17:  4       4    0          1
#> 18:  4       5    0          0
#> 19:  4       6    1          1
#> 20:  4       7    0          1
DT[, diar_prev3 := ifelse(agedays < 4, NA, RcppRoll::roll_sum(lag(diar, 1), n=3L, fill=NA, align = "right")), by=id][]
#>     id agedays diar diar_prev3
#>  1:  1       1    1         NA
#>  2:  1       2    0         NA
#>  3:  1       3    1         NA
#>  4:  1       4    1          2
#>  5:  1       5    0          2
#>  6:  2       6    0         NA
#>  7:  2       7    0         NA
#>  8:  2       8    1          1
#>  9:  2       9    1          2
#> 10:  2      10    1          3
#> 11:  3       2    0         NA
#> 12:  3       5    0         NA
#> 13:  3       6    0          0
#> 14:  3       8    1          1
#> 15:  3       9    1          2
#> 16:  4       1    0         NA
#> 17:  4       4    0         NA
#> 18:  4       5    0          0
#> 19:  4       6    1          1
#> 20:  4       7    0          1

Created on 2020-07-20 by the reprex package (v0.3.0)