2
votes

Using initial values, I want to fill iteratively NAs in a data.table column based on growth rates stored in a separate column, by id.

Take the following data.table as an example:

library(data.table)
DT <- data.table(id = c("A","A","A","A","B","B","B","B"),  date=1:4,
 growth=1L+runif(8), index= c(NA,250,NA,NA,NA,300,NA,NA))

> DT
   id date   growth index
1:  A    1 1.654628    NA
2:  A    2 1.770219   250
3:  A    3 1.255893    NA
4:  A    4 1.185985    NA
5:  B    1 1.826187    NA
6:  B    2 1.055251   300
7:  B    3 1.180389    NA
8:  B    4 1.204108    NA

Basically, what I need id for index values after date 2:

index_{i,t} = growth_{i,t}*index_{i,t-1}

And, for values before date 2:

index_{i,t} = index_{i,t-1}/growth_{i,t-1}

I had a go using shift, but this replace just index at t+1:

DT[, index := growth * shift(index,1L, type="lag")]  

UPDATE The desired result looks like that

> DT
   id date   growth    index
1:  A    1 1.440548 141.2255
2:  A    2 1.395092 250.0000
3:  A    3 1.793094 313.9733
4:  A    4 1.784224 372.3676
5:  B    1 1.129264 284.2926
6:  B    2 1.978359 300.0000
7:  B    3 1.228979 354.1167
8:  B    4 1.453433 426.3948
1
sounds like you'd want to subset by date and group by id; e.g., like DT[date > 2, index2 := growth * shift(index,1L, type="lag"), by = id] and DT[date < 2, index2 := shift(index,1L) / shift(growth, 1L), by = id] . That makes all NA for your example data. Is your goal to fill all the NAs? - Chris Holbrook
@ChrisHolbrook: Indeed, the goal is to fill all the NAs. The issue is here that with the shift command we may fill one observation. But as this newly filled observation is needed for the next filling the command doesn't meet the goal, right? - Luks

1 Answers

1
votes

First, we'll define a function which takes two vectors, values and growths, that

  1. Finds the first non-NA value in values
  2. Determines the ratio of each element in values to the non-NA one by multiplying all the growths between it and the non-NA.
  3. Does that multiplication

Note that this won't catch situations where there's more than one non-NA value, and it will error if values only has NAs. But I leave exception-handling to you, because you'll know best what to do.

apply_growth <- function(values, growths) {
  given <- which(!is.na(values))[1]

  cumulative_growth <- vapply(
    X = seq_along(growths),
    FUN.VALUE = numeric(1),
    FUN = function(x) {
      if (x < given) {
        1 / prod(growths[seq(x + 1, given)])
      } else if (x > given) {
        prod(growths[seq(given + 1, x)])
      } else if (x == given) {
        1
      }
    }
  )

  values[given] * cumulative_growth
}

Now we'll apply it to each subgroup of DT. Just to be sure, we'll specify the rows have to be ordered by date.

DT[
  order(date),
  index := apply_growth(index, growth),
  by = id
]

DT
#    id date   growth    index
# 1:  A    1 1.993863 180.7514
# 2:  A    2 1.383115 250.0000
# 3:  A    3 1.350102 337.5256
# 4:  A    4 1.863802 629.0809
# 5:  B    1 1.664999 249.2398
# 6:  B    2 1.203660 300.0000
# 7:  B    3 1.595310 478.5931
# 8:  B    4 1.002311 479.6989