If I have the data set
ID DOB STARTDATE ENDDATE FAILURE
1 10/10/75 5/10/84 15/5/03 0
2 15/9/76 10/3/84 19/6/92 0
2 15/9/76 22/2/93 15/1/99 0
2 15/9/76 15/4/99 15/1/03 0
- Where ID is the patient
- DOB is their date of birth
- STARTDATE is when they entered the study
- ENDDATE is when they left the study
- FAILURE is if they died or not
There are multiple start dates for some individuals
I'm trying to manipulate it so that we have as well as this, a new STARTDATE entry 10 and 20 years after the DOB (until the ENDDATE is reached). So
ID DOB STARTDATE ENDDATE FAILURE
1 10/10/75 5/10/84 15/5/03 0
2 15/9/76 10/3/84 19/6/92 0
2 15/9/76 14/9/86 19/6/92 0
2 15/9/76 22/2/93 15/1/99 0
2 15/9/76 15/9/96 15/1/99 0
2 15/9/76 15/4/99 15/1/03 0
- ID = 1 remains unchanged since it DOB to STARTDATE difference is already less than 10
- ID = 2 keeps the three current rows and adds two new rows. One 10 years after DOB and one 20 years after DOB
So far, I've tried to solve it by adding in a new column which calculates the age of entry (STARTDATE - DOB):
library(eeptools)
AGEENTRY <- age_calc(DOB, STARTDATE, units = "years")
And then run survSplit
as so
survSplit(DATA, cut = c(10, 20), end = "AGEENTRY",
event = "FAILURE", start = "START")
I know in STATA it can be done quite nicely with
stsplit newvariable, at(10(10)20)
However, this is not doing exactly what I was hoping it would. I've been stuck on this problem for over a day now, so any help would be very much appreciated!