0
votes

I am working with time series data (6000 observations, over 7000 days) that is said to be "daily". However, measurements do not necessarily happen everyday. Sometimes, several measurements can happen on the same day. This is why I think it would be better to analyze this data "monthly" instead of "daily".

I create some dummy data that represents my time series:

library(xts)

date_decision_made = seq(as.Date("2014/1/1"), as.Date("2016/1/1"),by="day")

date_decision_made <- format(as.Date(date_decision_made), "%Y/%m/%d")

property_damages_in_dollars <- rnorm(731,100,10)

final_data <- data.frame(date_decision_made, property_damages_in_dollars)

I am trying to follow the instructions here : Getting a monthly time series from weekly data

y.mon  <- aggregate(final_data, by = format(date_decision_made,format="%Y/%m/%d"), FUN=sum)

But this code gives me an error:

Error in aggregate.data.frame(final_data, by = format(date_decision_made,  : 
  'by' must be a list

Does someone know what I am doing wrong?

Thanks

1
What and where is final_dataset?Duck
@Duck: sorry, it was a typo. I have fixed the mistake. (final_dataset was the same as final_data)stats555
Try this y.mon <- aggregate(property_damages_in_dollars~format(date_decision_made,format="%Y/%m/%d"),data=final_data, FUN=sum)Duck
And the proper option for months can be y.mon <- aggregate(property_damages_in_dollars~format(as.Date(date_decision_made),format="%Y/%m"),data=final_data, FUN=sum)Duck
For sure monthly to avoid colineal issues because of zeroes! Please do not forget to accept the answer!Duck

1 Answers

1
votes

The possible solution for OP issue can be based on:

#Code
y.mon<-aggregate(property_damages_in_dollars~format(as.Date(date_decision_made),
format="%Y/%m"),data=final_data, FUN=sum)

More variants can be explored around the format() options.