1
votes

I am trying to compute the accuracy of predictions using mean absolute scaled error (MASE) for cross-sectional (non-time series) data in R. I have a vector of forecasted values and a vector of observed values. According to Rob Hyndman, MASE is suitable for non-time series data. According to Hyndman's textbook, for calculating MASE with cross-sectional data, errors should be scaled relative to the mean forecast. I tried to calculate MASE with the accuracy() function of the forecast package, which according to its documentation, uses the in-sample mean forecasts for scaling errors in non-time series data. However, as noted by others, the accuracy() function does not compute MASE with two vectors as arguments because it requires historical data to compute the scaling factor. If I had time-series data, I could create a forecast object and pass it to accuracy(), but I'm not sure how to do this with cross-sectional data.

Here's my attempt to create a function to calculate MASE based on Hyndman's formula for scaling errors in cross-sectional data:

computeMASE <- function(forecast, actual){
  mydata <- data.frame(na.omit(cbind(forecast, actual)))
  n <- nrow(mydata)

  scalingFactor <- mean(mydata$forecast)
  errors <- mydata$actual - mydata$forecast
  scaledErrors <- errors/(sum(abs(mydata$actual - scalingFactor)) / n)

  MASE <- mean(abs(scaledErrors))
  return(MASE)
}

Here's a small example:

set.seed(33333)
observedValues <- rnorm(1000)
forecastedValues <- observedValues + rnorm(1000, sd=.5)

observedValues[sample(1:1000, 10)] <- NA
forecastedValues[sample(1:1000, 10)] <- NA

computeMASE(forecast = forecastedValues, actual = observedValues)
[1] 0.5147389

It's not clear to me that I want to pass a lm model to accuracy() because my vector of forecasted values contains the forecasted values, not values on a predictor that I'm using to generate forecasted values. Indeed, when I pass the two vectors as arguments versus as a lm model, the accuracy estimates are different:

round(accuracy(f=forecastedValues, x=observedValues), 2)
         ME RMSE  MAE    MPE   MAPE
Test set  0 0.51 0.41 -55.62 259.12

round(accuracy(f=lm(observedValues ~ forecastedValues)), 2)
             ME RMSE  MAE    MPE   MAPE MASE
Training set  0 0.44 0.35 -20.87 203.64 0.44

My function calculates MASE with the same value as accuracy() when passing the predictions from a lm model:

computeMASE(forecast = predict(lm(observedValues ~ forecastedValues, na.action=na.exclude)), actual = observedValues)
[1] 0.4413931

accuracy(f=lm(observedValues ~ forecastedValues))
                   ME      RMSE       MAE       MPE     MAPE      MASE
Training set 2.014282e-17 0.4388396 0.3488355 -20.86792 203.6389 0.4413931

I have two questions:

  1. Is my function correct for calculating MASE with cross-sectional data, according to Hyndman's formula?
  2. Is there a simpler way to calculate MASE with cross-sectional data using the accuracy() function without needing to write a function?
1

1 Answers

3
votes

The help file says that it works. Don't you believe it?

# Generate some artificial training and test data
x <- 1:100
y <- 5 + .1*x + rnorm(100)
xtrain <- sample(x, size=80)
ytrain <- y[xtrain]
xtest <- x[-xtrain]
ytest <- y[-xtrain]

# Compute forecasts from a linear model
forecast <- predict(lm(ytrain~xtrain), newdata=data.frame(xtrain=xtest))

# Plot training data, test data and forecasts
plot(xtrain, ytrain)
lines(xtest,forecast,col='red',pch=19)
points(xtest,ytest,col='blue',pch=19)

# Compute accuracy statistics
accuracy(forecast,ytest)

Both forecast and ytest are numerical vectors as requested. But MASE will not be produced because the MASE is based on a scaling factor computed from the training data. So it makes no sense to ask for MASE if you don't also pass the training data to accuracy. The simplest way to do that is to pass the whole forecast object like this:

forecast <- forecast(lm(ytrain~xtrain), newdata=data.frame(xtrain=xtest))
accuracy(forecast,ytest)

The forecast object contains more than just the point forecasts for the future periods. It also contains the training data, uncertainty estimates, and more.

If you don't want to use a lm for prediction, then you have to set up the forecast object yourself, containing at least the point predictions (mean), the insample fits (fitted) and the training responses (x). Like this:

forecast <- structure(list(mean=rep(10,20), fitted=rep(10,80),
   x=ytrain), class='forecast')
accuracy(forecast,ytest)