1
votes

I have the following data frame and would like to create a new variable Var3 based on values from Var2, year, and month. My data is in long format, thus every person (ID) has multiple rows (i.e., one row per measurement occasion). For the creation of the new variable, therefore, the data has to be grouped by ID, I guess.

dd <- read.table(text=
"ID    Var1   Year      Month    Var2
1      0      2008      2       -4.17      
1      0      2009      2       -3.17
1      0      2010      2       -2.17
1      0      2011      2       -1.17
1      1      2013      2        0.17
1      0      2014      10       2.84
2      0      2008      2       -3.33      
2      0      2010      2       -1.33
2      1      2011      2        0.33
2      0      2013      2        5.43
2      0      2015      11       6.43
3      1      2010      2        0.85
3      0      2011      2        2.33
3      0      2013      2        4.86
3      0      2015      11       6.33
3      0      2017      10       4.12", header=TRUE) 

The new variable should be built following specific rules:

  1. If Var1 == 1, then Var3 == Var2
  2. If Var2 < 0, then Var3 == Var2
  3. If Var2 >= 0, then the following procedure should be used to create Var 3:

For the first row in which Var2 >= 0, take value of Var2 in the row with Var1 == 1 (e.g. 0.17 for ID 1), add the result of this equation: the value of year from the first row below Var1 == 1 (e.g., 2014 for ID 1) minus the value of year from the row with Var1 == 1 (e.g., 2013 for ID 1) Further add the result of this equation: the value of month from the first row below Var1 == 1 (e.g., 10 for ID 1) minus the value of month from the row with Var1 == 1 (e.g., 2 for ID 1) divided by 12.

For the second row in which Var2 >= 0, take value of Var2 in the row with Var1 == 1 (e.g. 0.33 for ID 2), add the result of this equation: the value of year from the second row below Var1 == 1 (e.g., 2015 for ID 2) minus the value of year from the row with Var1 == 1 (e.g., 2011 for ID 2) Further add the result of this equation: the value of month from the second row below Var1 == 1 (e.g., 2 for ID 2) minus the value of month from the row with Var1 == 1 (e.g., 2 for ID 2) divided by 12.

...and so on for the third row in which Var2 >= 0 etcetera...

The final data frame should look like this:

dd_new <- read.table(text=
"ID    Var1   Year      Month    Var2     Var3
1      0      2008      2       -4.17    -4.17       
1      0      2009      2       -3.17    -3.17 
1      0      2010      2       -2.17    -2.17 
1      0      2011      2       -1.17    -1.17 
1      1      2013      2        0.17     0.17 
1      0      2014      10       2.89     1.836667  
2      0      2008      2       -3.33    -3.33           
2      0      2010      2       -1.33    -1.33     
2      1      2011      2        0.33     0.33     
2      0      2013      2        5.43     2.33        
2      0      2015      11       6.43     5.08    
3      1      2010      2        0.67     0.67     
3      0      2011      2        2.33     1.67    
3      0      2013      2        4.86     3.67    
3      0      2015      11       6.33     6.42          
3      0      2017      10       4.12     8.336667", header=TRUE) 

Unfortunately, I do not know how to create a new variable using values from rows below a certain row (i.e. below the row in which Var1 == 1). Can anybody help me out?

Thanks a lot!

1

1 Answers

1
votes

EDIT

This is how you can create Var3 following the updated specific rules

library(dplyr)

dd %>% 
  group_by(ID) %>% 
  mutate(
    Var3 = ifelse(Var1 == 1 | Var2 < 0, Var2, Var2[Var1 == 1] + Year[row_number()] - Year[Var1 == 1] + (Month[row_number()] - Month[Var1 == 1])/12)
  )

Output

# A tibble: 16 x 6
# Groups:   ID [3]
      ID  Var1  Year Month  Var2  Var3
   <int> <int> <int> <int> <dbl> <dbl>
 1     1     0  2008     2 -4.17 -4.17
 2     1     0  2009     2 -3.17 -3.17
 3     1     0  2010     2 -2.17 -2.17
 4     1     0  2011     2 -1.17 -1.17
 5     1     1  2013     2  0.17  0.17
 6     1     0  2014    10  2.89  1.84
 7     2     0  2008     2 -3.33 -3.33
 8     2     0  2010     2 -1.33 -1.33
 9     2     1  2011     2  0.33  0.33
10     2     0  2013     2  5.43  2.33
11     2     0  2015    11  6.43  5.08
12     3     1  2010     2  0.67  0.67
13     3     0  2011     2  2.33  1.67
14     3     0  2013     2  4.86  3.67
15     3     0  2015    11  6.33  6.42
16     3     0  2017    10  4.12  8.34