I'm trying to create new variables with mutate in dplyr and I can't understand my error, I've tried everything and have not stumbled upon this issue in the past.
I have a large data set, over a million observations. I only provide you with the 20 first observations.
This is how my data looks like:
data1 <- read.table(header=TRUE, text="IDnr visit time year end event survival
7 1 04/09/06 2006 31/12/06 0 118
7 2 04/09/06 2007 31/12/07 0 483
7 3 04/09/06 2008 31/12/08 0 849
7 4 04/09/06 2009 31/12/09 0 1214
7 5 04/09/06 2010 31/12/10 0 1579
7 6 04/09/06 2011 31/12/11 0 1944
20 1 24/10/03 2003 31/12/03 0 68
20 2 24/10/03 2004 31/12/04 0 434
20 3 24/10/03 2005 31/12/05 0 799
20 4 24/10/03 2006 31/12/06 0 1164
20 5 24/10/03 2007 31/12/07 0 1529
20 6 24/10/03 2008 31/12/08 0 1895
20 7 24/10/03 2009 31/12/09 0 2260
20 8 24/10/03 2010 31/12/10 0 2625
20 9 24/10/03 2011 31/12/11 0 2990
87 1 17/01/06 2006 31/12/06 0 348
87 2 17/01/06 2007 31/12/07 0 713
87 3 17/01/06 2008 31/12/08 0 1079
87 4 17/01/06 2009 31/12/09 0 1444
87 5 17/01/06 2010 31/12/10 0 1809")
I must say that the date and time variables does not have this format in my dataset, I't is coded with POSIXct with the format ("%Y-%m-%d"). I't somehow reformats itself when I attach I't to stackoverflow and apply the "code" citations.
Okey, the problem is that I'm trying to create new survival time variables in the same dataset, one is for a cox regression model with stop and start time (survival is stop time and the new start variable should be called survcox).
Also im trying to do a poisson regression where the offset variable (i.e the survival time variable) should be called survpois. This is the code I'm trying to use;
data2 <- data1 %>%
group_by(IDnr) %>%
mutate(survcox = ifelse(visit==1, 0, lag(survival)),
year_aar = substr(data1$year, 1,4), first_day = as.POSIXct(paste0(year_aar, "-01-01-")),
survpois = as.numeric(data1$end - first_day)+1) %>%
mutate(survpois = ifelse(year_aar > first_day, as.numeric(end - year_aar),
survpois)) %>%
ungroup()
I receive an error in this step!
Error: incompatible size (1345000), expecting 6 (the group size) or 1
I have no idea why I get this error, what I't means and why my code doesn't work.
All the help I can get is appreciated, thanks in advance!
year_aar = substr(data1$year, 1,4)
seems to be returning an error. You probably meantyear_aar = substr(year, 1,4)
It seems that you have more things going on. – jazzurro