Time series forecasting, dealing with known big orders

Question

I have many data sets with known outliers (big orders)

data <- matrix(c("08Q1","08Q2","08Q3","08Q4","09Q1","09Q2","09Q3","09Q4","10Q1","10Q2","10Q3","10Q4","11Q1","11Q2","11Q3","11Q4","12Q1","12Q2","12Q3","12Q4","13Q1","13Q2","13Q3","13Q4","14Q1","14Q2","14Q3","14Q4","15Q1", 155782698, 159463653.4, 172741125.6, 204547180, 126049319.8, 138648461.5, 135678842.1, 242568446.1, 177019289.3, 200397120.6, 182516217.1, 306143365.6, 222890269.2, 239062450.2, 229124263.2, 370575384.7, 257757410.5, 256125841.6, 231879306.6, 419580274, 268211059, 276378232.1, 261739468.7, 429127062.8, 254776725.6, 329429882.8, 264012891.6, 496745973.9, 284484362.55),ncol=2,byrow=FALSE)

The top 11 outliers of this specific series are:

outliers <- matrix(c("14Q4","14Q2","12Q1","13Q1","14Q2","11Q1","11Q4","14Q2","13Q4","14Q4","13Q1",20193525.68, 18319234.7, 12896323.62, 12718744.01, 12353002.09, 11936190.13, 11356476.28, 11351192.31, 10101527.85, 9723641.25, 9643214.018),ncol=2,byrow=FALSE)

What methods are there that i can forecast the time series taking these outliers into consideration?

I have already tried replacing the next biggest outlier (so running the data set 10 times replacing the outliers with the next biggest until the 10th data set has all the outliers replaced). I have also tried simply removing the outliers (so again running the data set 10 times removing an outlier each time until all 10 are removed in the 10th data set)

I just want to point out that removing these big orders does not delete the data point completely as there are other deals that happen in that quarter

My code tests the data through multiple forecasting models (ARIMA weighted on the out sample, ARIMA weighted on the in sample, ARIMA weighted, ARIMA, Additive Holt-winters weighted and Multiplcative Holt-winters weighted) so it needs to be something that can be adapted to these multiple models.

Here are a couple more data sets that i used, i do not have the outliers for these series yet though

data <- matrix(c("08Q1","08Q2","08Q3","08Q4","09Q1","09Q2","09Q3","09Q4","10Q1","10Q2","10Q3","10Q4","11Q1","11Q2","11Q3","11Q4","12Q1","12Q2","12Q3","12Q4","13Q1","13Q2","13Q3","13Q4","14Q1","14Q2","14Q3", 26393.99306, 13820.5037, 23115.82432,    25894.41036,    14926.12574,    15855.8857, 21565.19002,    49373.89675,    27629.10141,    43248.9778, 34231.73851,    83379.26027,    54883.33752,    62863.47728,    47215.92508,    107819.9903,    53239.10602,    71853.5,    59912.7624, 168416.2995,    64565.6211, 94698.38748,    80229.9716, 169205.0023,    70485.55409,    133196.032, 78106.02227), ncol=2,byrow=FALSE)

data <- matrix(c("08Q1","08Q2","08Q3","08Q4","09Q1","09Q2","09Q3","09Q4","10Q1","10Q2","10Q3","10Q4","11Q1","11Q2","11Q3","11Q4","12Q1","12Q2","12Q3","12Q4","13Q1","13Q2","13Q3","13Q4","14Q1","14Q2","14Q3",3311.5124,    3459.15634, 2721.486863,    3286.51708, 3087.234059,    2873.810071,    2803.969394,    4336.4792,  4722.894582,    4382.349583,    3668.105825,    4410.45429, 4249.507839,    3861.148928,    3842.57616, 5223.671347,    5969.066896,    4814.551389,    3907.677816,    4944.283864,    4750.734617,    4440.221993,    3580.866991,    3942.253996,    3409.597269,    3615.729974,    3174.395507),ncol=2,byrow=FALSE)

If this is too complicated then an explanation of how, in R, once outliers are detected using certain commands, the data is dealt with to forecast. e.g smoothing etc and how i can approach that writing a code myself (not using the commands that detect outliers)

This question is more about statistics not about programming. Can you move this to Cross validated? — forecaster
Is your last observation correct? It seems to be off by a factor of 10 and has a different format. — J.R.
How do you know what points are outliers? You mention all these weighted methods, do you mean that you want to downweigh the known outliers by some fixed amount that you have determined using other methods? Or would you consider a model that provides a level of smoothing and thus "ignores" outliers without being told which ones they are? — konvas

WaltS WaltS · Accepted Answer · 2015-04-13T13:21:36

Your outliers appear to be seasonal variations with the largest orders appearing in the 4-th quarter. Many of the forecasting models you mentioned include the capability for seasonal adjustments. As an example, the simplest model could have a linear dependence on year with corrections for all seasons. Code would look like:

df <- data.frame(period= c("08Q1","08Q2","08Q3","08Q4","09Q1","09Q2","09Q3","09Q4","10Q1","10Q2","10Q3",
                       "10Q4","11Q1","11Q2","11Q3","11Q4","12Q1","12Q2","12Q3","12Q4","13Q1","13Q2",
                       "13Q3","13Q4","14Q1","14Q2","14Q3","14Q4","15Q1"),
                 order= c(155782698, 159463653.4, 172741125.6, 204547180, 126049319.8, 138648461.5,
                        135678842.1, 242568446.1, 177019289.3, 200397120.6, 182516217.1, 306143365.6,
                        222890269.2, 239062450.2, 229124263.2, 370575384.7, 257757410.5, 256125841.6,
                        231879306.6, 419580274, 268211059, 276378232.1, 261739468.7, 429127062.8, 254776725.6,
                        329429882.8, 264012891.6, 496745973.9, 42748656.73))

seasonal <- data.frame(year=as.numeric(substr(df$period, 1,2)), qtr=substr(df$period, 3,4), data=df$order)
ord_model <- lm(data ~ year + qtr, data=seasonal)
seasonal <- cbind(seasonal, fitted=ord_model$fitted)
library(reshape2)
library(ggplot2)
plot_fit <- melt(seasonal,id.vars=c("year", "qtr"), variable.name = "Source", value.name="Order" )
ggplot(plot_fit, aes(x=year, y = Order, colour = qtr, shape=Source)) + geom_point(size=3)

which gives the results shown in the chart below: Linear fit with seasonal adjustments

Models with a seasonal adjustment but nonlinear dependence upon year may give better fits.

Time series forecasting, dealing with known big orders

3 Answers