1
votes

I have a data frame:

a <- data.frame(BEG_D=as.Date(c("2014-01-01","2014-01-01","2014-01-01","2014-01-01","2014-01-01","2014-01-01","2014-01-01","2014-01-08")) , day=c("Mon","Tues","Wed","Thurs","Fri","Satur","Sun","Mon"), WkNo=c(1,1,1,1,1,1,1,2))

Here BEG_D represents beginning day of the week (with "2014-01-01" being Sunday). In order to generate rest of the date numbers. I have written a custom function and using the same with ddply:

date_generator <- function(f){
    f$seq <- seq(nrow(f))-1
    f$date <- as.Date(f$BEG_D + f$seq)
    return(f)
}

b <- ddply(a,.(WkNo),date_generator)

This works fine as the result as in new data frame, I have:

seq = c(0,1,2,3,4,5,6,0)
date = c("2014-01-01","2014-01-02","2014-01-03","2014-01-04","2014-01-05","2014-01-06","2014-01-07","2014-01-08")

But for my large data frame takes long time. Apart from this there are certain more ddply operations which were taking long time. So I decided to use data.table with the same data.

date_generator <- function(f){
    f[,seq := seq(nrow(f))-1]
    f[,.(date = as.Date(BEG_D + seq))]
    return(f)
}

a[,date_generator(.SD),by=.(WkNo)]

Doing so threw an error:

Error in [.data.table(f, , :=(seq, seq(nrow(f)) - 1)) : .SD is locked. Using := in .SD's j is reserved for possible future use; a tortuously flexible way to modify by group. Use := in j directly to modify by group by reference.

What is the right way to write this custom function with data.table and why ddply is so slow for large data frame?

1
I think you're looking for a[, newdate := BEG_D + 1:.N - 1, by=WkNo] (I don't use plyr and so can't compare.) - Frank
Thanks @Frank ..Indeed it work fine for data table case...Don't have enough reputation to like your answer though :( - abhiieor

1 Answers

4
votes

Here's the standard way to do that in a data.table:

a[, date := BEG_D + 1:.N - 1, by=WkNo]

The variable .N stores nrow(.SD), the size of the by group. I'd recommend having a look at the excellent introductory materials for the package to get a sense of its idioms.