33
votes

I am trying to do time series modeling and forecasting using R based on weekly data like below:

biz week     Amount        Count
2006-12-27   973710.7     816570
2007-01-03  4503493.2    3223259
2007-01-10  2593355.9    1659136
2007-01-17  2897670.9    2127792
2007-01-24  3590427.5    2919482
2007-01-31  3761025.7    2981363
2007-02-07  3550213.1    2773988
2007-02-14  3978005.1    3219907
2007-02-21  4020536.0    3027837
2007-02-28  4038007.9    3191570
2007-03-07  3504142.2    2816720
2007-03-14  3427323.1    2703761
...
2014-02-26  99999999.9   1234567

About my data: As seen above, each week is labeled by first day for the week (my week starts on Wed. and ends on Tues). When I construct my ts object, I tried

ts <- ts(df, frequency=52, start=c(2007,1))

the problem I have is:

1) Some year may have 53 weeks, so frequency=52 will not work for those years;

2) My starting week/date is 2006-12-27, how should I set the start parameter? start=c(2006,52) or start=c(2007,1) since week of 2006-12-27 really cross the year boundary? Also, for modeling, is it better to have complete year worth of data (say for 2007 my start year if I only have partial year worth of data), is it better not to use 2007, instead to start with 2008? What about 2014: since it is not a complete year yet, should I use what I have for modeling or not? Either way, I still have an issue with whether or not to include those weeks in the year boundary like 2006-12-27. Should I include it as wk 1 for 2007 or the last week of 2006?

3) When I use ts <- ts(df, frequency=52, start=c(2007,1)) and then print it, I got the results shown below, so instead of 2007.01, 2007.02, 2007.52..., I got 2007.000, 2007.019, ..., which it gets from 1/52=0.019. This is mathematically correct but not really easy to interpret. Is there a way to label it as the date itself just like a data frame or at least 2007 wk1, 2007 wk2...

=========

Time Series:
Start = c(2007, 1) 
End = c(2014, 11) 
Frequency = 52 
          Amount        Count
2007.000   645575.4     493717
2007.019  2185193.2    1659577
2007.038  1016711.8     860777
2007.058  1894056.4    1450101
2007.077  2317517.6    1757219
2007.096  2522955.8    1794512
2007.115  2266107.3    1723002 

4) My goal is to model this weekly data and then try to decompose it to see seasonal components. It seems like I have to use the ts() function to convert to a ts object sp that I can use the decompose() function. I tried xts() and I got an error stating " time series has no or less than 2 periods". I guess this is because xts() won't let me specify the frequency, right?

xts <- xts(df,order.by=businessWeekDate)

5) I looked for the answer in this forum and other places as well; most of the examples are monthly, and though there are some weekly time series questions, none of the answers are straightforward. Hopefully somebody can help answer my questions here.

3

3 Answers

26
votes

Using non-integer frequencies works quite well and is compatible with most models (auto.arima, ets, ...). For the start date, I just use the convenience functions in lubridate. The importance here is to be consistent when working with multiple time series of potentially different start and end dates.

library(lubridate)
ts(df$Amount, 
   freq=365.25/7, 
   start=decimal_date(ymd("2006-12-27")))
2
votes

First make sure that your data has exactly 52 data per year. To do that, identify the years with 53 data and remove the one which is the less important for your seasonality pattern (for instance do not remove a week in December if you want to check the Christmas sales seasonality (!)

Xts is a good format as it is more flexible, however all the decomposition and forecasting tools usually work with ts as they require a fix number of data per cycle.

regarding your question on the non complete years. it should not be an issue. R doesn't know when is january or december, hence a year can start and end anytime.

0
votes

Concerning your 4th question, I think the error is because you have just one period data (52 weeks), and you may need another 52 weeks data to complete 2 periods.