0
votes

Here is my toy data, I want to calculate diff_var4.

df <- tibble::tribble(
  ~var1, ~var2, ~var3, ~var4, ~diff_var4,
     1L,    1L,    1L,    2L,         NA,
     1L,    1L,    1L,    2L,         NA,
     1L,    2L,    1L,    2L,         0L,
     1L,    2L,    1L,    2L,         0L,
     1L,    4L,    1L,    2L,         0L,
     1L,    5L,    1L,    2L,         0L,
     1L,    6L,    2L,    8L,         6L,
     1L,    6L,    2L,    8L,         6L,
     2L,    4L,    1L,    5L,         NA,
     2L,    5L,    1L,    5L,         0L,
     2L,    5L,    1L,    5L,         0L,
     2L,    6L,    2L,    8L,         3L,
     2L,    6L,    2L,    8L,         3L)

var1 to var4 are input and I need to calculate diff_var4 so that

condition 1: for every var1, if var3 is 1 and var2 is min var2, then diff_var4 is var4 - previous(var4) for the number of observations for which the var2 remains the same.

condition 2: for every var1, if var3 changes, then diff_var4 is var4 - previous(var4) for the number of observations for which the var2 remains the same.

I started with

df %>% group_by(var1) %>% 
  mutate(diff_var4 = var4-lag(var4))

but can't get the desired diff_var4 with NA in the 2nd row, 6 in the 8th row, and 3 in the last row!

How can I calculate diff_var4, preferably with tidyverse solution?

1

1 Answers

0
votes

The following has solved the problem:

df %>% group_by(var1) %>% 
  mutate(diff_var4 = var4-lag(var4)) %>%
  group_by(var1, var2) %>% 
  mutate(diff_var4 = max(diff_var4))

Still open to other solutions, if you have any.