1
votes

I have a dataset sales_history. Here is a dput of the first 15 lines

sales_history <- structure(list(month = c("2008/01", "2008/02", 
    "2008/03", "2008/04", "2008/05", "2008/06", "2008/07", 
    "2008/08", "2008/09", "2008/10", "2008/11", "2008/12", 
    "2009/01", "2009/02", "2009/03"), 
    sales= c(941, 1275, 1908, 2152, 1556, 
    3052, 2627, 3244, 3817, 3580, 444, 
    3332, 2823, 3407, 4148 )), 
    .Names = c("month", "sales"), 
    row.names = c(NA, 15L), 
    class = "data.frame")

I have months from 2008/01 until 2013/10. I did auto arima forecast on it by using:

arimaforecast<-function(df)
{
    ts1<- ts(df$sales, frequency=12, start=c(2008,1))
    fit<-auto.arima(ts1,ic="bic") 
    plot1=plot(forecast(fit,h=20))
    return(plot1)
}
arimaforecast(sales_history)

Then I want to plot the time series. I wrote as below.

y <- ts(sales_history$sales,freq=12,start=c(2011,1),end=c(2013,10))
yt <- window(y,end=c(2013,4))
yfit <- auto.arima(yt,ic="bic")
yfor <- forecast(yfit,h=10)
plot(yfor, main="sales Forecasting", sub="OPTIMAL ARIMA MODEL",
    xlab="MONTH", ylab="sales")
lines(fitted(yfor),col="blue")
lines(y,col="red")

Then, the graph turns out to be very ugly. How do I produce a better graph that does the following?

  1. y-axis does not show as 1e+06, 3e+06, but rather, something like 1M, 3M, etc. And also,
  2. use green histograms (that is, bars) to show history sales data, while still using lines (with connected dots) to show fitted history and forecasts?
1
Could you provide an example data set? Where is the sales_history data frame coming from? alternatively, you could provide your data using dput(sales_history) (or a subset of that if it's a lot of data).sebkopf
@sebkopf using dput(head(sales_history,n=15)) gives the following output: structure(list(month = c("2008/01", "2008/02", "2008/03", "2008/04", "2008/05", "2008/06", "2008/07", "2008/08", "2008/09", "2008/10", "2008/11", "2008/12", "2009/01", "2009/02", "2009/03"), sales= c(941628, 1277005, 1908769, 2152362, 1556356, 3052123, 2627250, 3244551, 3817610, 3580848, 4447715, 3332705, 2823324, 3407557, 4148698 )), .Names = c("month", "sales"), row.names = c(NA, 15L), class = "data.frame") thanks!user3698033
Really? Put it in the question. What use it has in the comments?David Arenburg
@DavidArenburg Thanks for reminding; just updated the question.user3698033

1 Answers

2
votes

I'm still not entirely sure what you mean with the bars since there is no green line in your graph (blue and red only), but here's a stab at a plot that combines the different features I think you're looking for. Since the plot is a bit more complex and I'm not too familiar with the plotting functions available from forecast, this is implemented with the ggplot2 package, which makes very nice graphs and provides a lot of flexibility for adjustments (see ggplo2 for detailed documentation).

The first part of the code takes the forecast object yfor from your code example and turns it into a data frame that's easy to use in ggplot (you can improve this section by using a Date object instead of numeric timescale if you'd like a lot more flexibility in the x-axis labeling), the second part plots it (plot is rather cut off since this is only with a subset of your data but work just as well with the whole data set).

# convert forecast object into data frame
ts_values <- data.frame(
    time = as.numeric(time(yfor$x)),  
    sales = as.numeric(yfor$x), 
    fit = as.numeric(yfor$fitted))
ts_forecast <- data.frame(
    time = as.numeric(time(yfor$mean)),
    fit = as.numeric(yfor$mean),
    upper.80 = as.numeric(yfor$upper[,1]),
    upper.95 = as.numeric(yfor$upper[,2]),
    lower.80 = as.numeric(yfor$lower[,1]),
    lower.95 = as.numeric(yfor$lower[,2]))

# combine fitted data and forecast mean
ts_values <- rbind(ts_values, transform(ts_forecast[c("time", "fit")], sales = NA))

# plot it all
library(ggplot2)
ggplot(NULL, aes(x = time)) + 
    geom_bar(data = ts_values, aes(y = sales), stat = "identity", 
             fill = "dark green", position="dodge") + 
    geom_line(data = ts_values, aes(y = fit), colour = "red", size = 2) + 
    geom_ribbon(data = ts_forecast, aes(ymin = lower.95,  ymax = upper.95),  
                alpha=.2,  fill="red") + 
    geom_ribbon(data = ts_forecast, aes(ymin = lower.80,  ymax = upper.80),  
                alpha=.2,  fill="red") +
    scale_y_continuous(labels = function(x) paste(x/10^6, "M"), expand = c(0,0)) +
    theme_bw()

enter image description here