I've got data that is arrayed in a way that has day/month/year in separate columns (which is useful for how I need to subset the data). Each column besides the D/M/Y are columns of values. In my real data, these are counties in the U.S.
I need to sum different ranges of dates that vary by different day, month and year combinations. Given that each column is a county, the ddply/data.table/apply functions seem to be the most appropriate to solve the problem. However, I cannot figure out how to subset my data.
Here is a trivial example:
set.seed(1234)
data <- data.frame(day=rep(seq(1,5),9), month= rep(c(rep(1,5),rep(2,5),rep(11,5)),3),
year=c(rep(1999,15),rep(2000,15),rep(2001,15)), Abel = round(runif(n=45, 0, 5),2) ,
Bertha=round(runif(n=45, 0, 5),2), Charlie= round(runif(n=45,0, 5),2))
dt <- data.table(data)
keycols <- c("day", "month", "year")
setkeyv(dt, keycols) #not sure if this is necessary
Originally, I was thinking of using the ddply() command with the summarize option but have seen other instances where people have recommended using the data.table framework. However, I cannot figure out a way to subset the data in such a way that would get it to match results as follows:
sum(data$Abel[which(data$year==1999 & data$month==2 & data$day >=1 & data$day <4)]) #4.41
sum(data$Bertha[which(data$year==1999 & data$month==2 & data$day >=1 & data$day <4)]) #5.51
sum(data$Charlie[which(data$year==1999 & data$month==2 & data$day >=1 & data$day <4)]) #4.19
While this example might seem easy, I have to calculate 10-day averages of climate data for hundreds of counties over decades, so getting the ddply/data.table framework to work would really help!
I have tried both data.table and ddply, but both with errors:
dt[i=list(year==1999, month==2, day >=1, day<3), sum]
#Error in `[.data.table`(dt, i = list(year == 1999, month == 2, day >= :
# typeof x.day (integer) != typeof i.day (logical)
I realize that this means that I'm doing something wrong with setting up the data.table, but I haven't used it before and have looked through several help files and can't figure out the subset issues.
For ddply, I get a different error when I just try to use one singular column to try and get it to work:
ddply(data, .(year, month, day, Abel), summarize, sum.val = sum )
#Error in vector(type, length) :
# vector: cannot make a vector of mode 'builtin'.
Any guidance on solving this type of problem would be greatly appreciated!
sumexpessions is wrong isn't? - Paulo E. Cardososumstatements are wrong. I just double checked and just confirmed them. - Tonydata$Abel[data$year==1999 & data$month==2 & data$day >=1 & data$day <4]=[1] 3.94 2.28 2.24- Paulo E. Cardoso