I have a data frame:
a <- data.frame(BEG_D=as.Date(c("2014-01-01","2014-01-01","2014-01-01","2014-01-01","2014-01-01","2014-01-01","2014-01-01","2014-01-08")) , day=c("Mon","Tues","Wed","Thurs","Fri","Satur","Sun","Mon"), WkNo=c(1,1,1,1,1,1,1,2))
Here BEG_D represents beginning day of the week (with "2014-01-01" being Sunday). In order to generate rest of the date numbers. I have written a custom function and using the same with ddply:
date_generator <- function(f){
f$seq <- seq(nrow(f))-1
f$date <- as.Date(f$BEG_D + f$seq)
return(f)
}
b <- ddply(a,.(WkNo),date_generator)
This works fine as the result as in new data frame, I have:
seq = c(0,1,2,3,4,5,6,0)
date = c("2014-01-01","2014-01-02","2014-01-03","2014-01-04","2014-01-05","2014-01-06","2014-01-07","2014-01-08")
But for my large data frame takes long time. Apart from this there are certain more ddply operations which were taking long time. So I decided to use data.table with the same data.
date_generator <- function(f){
f[,seq := seq(nrow(f))-1]
f[,.(date = as.Date(BEG_D + seq))]
return(f)
}
a[,date_generator(.SD),by=.(WkNo)]
Doing so threw an error:
Error in [.data.table(f, , :=(seq, seq(nrow(f)) - 1)) : .SD is locked. Using := in .SD's j is reserved for possible future use; a tortuously flexible way to modify by group. Use := in j directly to modify by group by reference.
What is the right way to write this custom function with data.table and why ddply is so slow for large data frame?
a[, newdate := BEG_D + 1:.N - 1, by=WkNo](I don't use plyr and so can't compare.) - Frank