1
votes

I want to aggregate a dataframe in R by week and I am trying to use lubridate to do it.

date = as.Date(c('2006-05-02','2007-05-03','2006-05-04','2006-05-05','2006-05-08','2006-05-09'))
total = c(1,2,3,4,5,10)
df=data.frame(date, total)

I used the lubridate packages to do the following;

df$wk = weeks(agg$date)
agg = aggregate(data=agg, total ~ date + variable , FUN=sum)

This does not seem to return anything that works. You can cast the weeks to strings, but then you would need to cast the weeks back to normal R dates.

df$wk = as.character(weeks(agg$date))
agg = aggregate(data=agg, total ~ date , FUN=sum)

This poses another problem, the dates are now strings that look like this;

"113029d 0H 0M 0S"

I want to use ggplot on the agg dataframe, so I would need to convert this string into something that ggplot can understand. as.Date() obviously doesn't work and it seems like I might be able to convert the days into a unix_timestamp but that seems like I am doing too much effort.

How do I convert lubridates into normal R dates such that I can perform the aggregation? Normal R dates work perfectly fine in the aggregate function so I think I would prefer to only use lubridate for the binning of dates into weeks.

2
Getting error: df$wk = weeks(agg$date) Error in lapply(list(...), as.numeric) : object 'agg' not foundakrun
What is agg, what is value and variable?David Arenburg
Also weeks does not work on Date object , rather on an numeric value, so your method won't work. Please just show us the desired output and I think it will be easily solved using base RDavid Arenburg

2 Answers

3
votes

I'm not entirely sure regarding your desired output, but this should work (using base R only)

df$Weeks <- paste(format(df$date, "%U"), format(df$date, "%Y")) # Setting a week/year combination
temp <- aggregate(total ~ Weeks, df, sum)
temp <- temp[order(substr(temp$Weeks, 4, 8), substr(temp$Weeks, 1, 2)), ] # Ordering by year by week

library(ggplot2)
ggplot(temp, aes(Weeks, total, group = 1)) + 
geom_line() +
scale_x_discrete(limits = temp$Weeks) # rescaling x axis so it will follow the correct Year/Week order

enter image description here

0
votes

Prolly you can use data.table

require(data.table)
dt <- data.table(df)
dt[,sum(total),by=list(year(date),week(date))]
    year week V1
 1: 2006   18 10
 2: 2006   19 15