0
votes

I am working with a data frame (lets call is MyData) looking like below. What I want to do is to group by PatientKey and create a new Id called NewID. Every time for the same PatientKey that the TimeBetweenTests is > 14, the new Id should increase by 1, and stay on that particular new value until a new PatientKey shows up OR for the same patientKey a new TimeBetweenTests > 14 shows up.

PatientKey             TimeBetweenTests     NewId        
1                      0                    NewId should be 1 (first patient)
1                      0                    NewId should be 1
1                      1                    NewId should be 1                                                                
1                      2                    NewId should be 1
2                      3                    NewId should be 2 (new patient)                                                                          
3                      4                    NewId should be 3 (new patient)      
3                      16                   NewId should be 4 (same patient but TimeBetweenTests > 14)                                                                                              
3                      80                   NewId should be 5 (same patient but TimeBetweenTests > 14)
4                      3                    NewId should be 6 (new patient)
4                      0                    NewId should be 6 (new patient)                                                                            
4                      90                   NewId should be 7 (same patient but TimeBetweenTests > 14)        
4                      110                  NewId should be 8 (same patient but TimeBetweenTests > 14) 
5                      3                    NewId should be 9 (new patient)
5                      3                    NewId should be 9
5                      3                    NewId should be 9

etc    
                                                                       

I have tried using dplyr for this but the problem is that the subsequent values does not change when I try code similar to:

MyData <- MyData %>% group_by(PatientKey) %>% mutate(NewId = ifelse(TimeBetweenTests > 14, lag(NewId), NewId))

Anyone has a convenient dplyr or data.table solution for this, alternatively a for loop approach.

1

1 Answers

0
votes

Try this

library(dplyr)
df %>% mutate(NewID = cumsum(lag(PatientKey, default = 0) != PatientKey | TimeBetweenTests > 14)

Output

   PatientKey TimeBetweenTests NewID
        <dbl>            <dbl> <int>
 1          1                0     1
 2          1                0     1
 3          1                1     1
 4          1                2     1
 5          2                3     2
 6          3                4     3
 7          3               16     4
 8          3               80     5
 9          4                3     6
10          4                0     6
11          4               90     7
12          4              110     8
13          5                3     9
14          5                3     9
15          5                3     9