I am trying to construct a group-specific variable that is:
- NA for observations until a condition is met.
- NA for the first observation when the condition is met.
- For subsequent observations, the value of the last prior observation when the condition was met.
Below is a MWE. I am looking for code to construct "last_value". I would much prefer to use data.table (or dplyr) for scalability/performance but am open to other suggestions. Thanks!
library(data.table)
data.table(
group = c(rep("alpha", 4), rep("beta", 3)),
time = c(1:4, 1:3),
condition = c("Yes", "No", "Yes", "No", "No", "Yes", "No"),
value = 1:7,
last_value = c(NA, 1, 1, 3, NA, NA, 6))
# group time condition value last_value
# 1: alpha 1 Yes 1 NA
# 2: alpha 2 No 2 1
# 3: alpha 3 Yes 3 1
# 4: alpha 4 No 4 3
# 5: beta 1 No 5 NA
# 6: beta 2 Yes 6 NA
# 7: beta 3 No 7 6
# last_value is:
# NA in the 1st row as that is the 1st observation for group "alpha"
# 1 in the 2nd row as the 1st observation is condition = "Yes"
# 1 in the 3rd row as the 1st observation is condition = "Yes"; 2nd observation is condition = "No"
# 3 in the 4rd row as the 3rd observation is condition = "Yes"
# NA in the 5th row as that is the 1st observation for group "beta"
# NA in the 6th row as there is no prior observation with condition = "Yes"
# 6 in the 7th row as the 6th observation is condition = "Yes"