I am trying to compute the accuracy of predictions using mean absolute scaled error (MASE) for cross-sectional (non-time series) data in R. I have a vector of forecasted values and a vector of observed values. According to Rob Hyndman, MASE is suitable for non-time series data. According to Hyndman's textbook, for calculating MASE with cross-sectional data, errors should be scaled relative to the mean forecast. I tried to calculate MASE with the accuracy() function of the forecast package, which according to its documentation, uses the in-sample mean forecasts for scaling errors in non-time series data. However, as noted by others, the accuracy() function does not compute MASE with two vectors as arguments because it requires historical data to compute the scaling factor. If I had time-series data, I could create a forecast object and pass it to accuracy(), but I'm not sure how to do this with cross-sectional data.
Here's my attempt to create a function to calculate MASE based on Hyndman's formula for scaling errors in cross-sectional data:
computeMASE <- function(forecast, actual){
mydata <- data.frame(na.omit(cbind(forecast, actual)))
n <- nrow(mydata)
scalingFactor <- mean(mydata$forecast)
errors <- mydata$actual - mydata$forecast
scaledErrors <- errors/(sum(abs(mydata$actual - scalingFactor)) / n)
MASE <- mean(abs(scaledErrors))
return(MASE)
}
Here's a small example:
set.seed(33333)
observedValues <- rnorm(1000)
forecastedValues <- observedValues + rnorm(1000, sd=.5)
observedValues[sample(1:1000, 10)] <- NA
forecastedValues[sample(1:1000, 10)] <- NA
computeMASE(forecast = forecastedValues, actual = observedValues)
[1] 0.5147389
It's not clear to me that I want to pass a lm model to accuracy() because my vector of forecasted values contains the forecasted values, not values on a predictor that I'm using to generate forecasted values. Indeed, when I pass the two vectors as arguments versus as a lm model, the accuracy estimates are different:
round(accuracy(f=forecastedValues, x=observedValues), 2)
ME RMSE MAE MPE MAPE
Test set 0 0.51 0.41 -55.62 259.12
round(accuracy(f=lm(observedValues ~ forecastedValues)), 2)
ME RMSE MAE MPE MAPE MASE
Training set 0 0.44 0.35 -20.87 203.64 0.44
My function calculates MASE with the same value as accuracy() when passing the predictions from a lm model:
computeMASE(forecast = predict(lm(observedValues ~ forecastedValues, na.action=na.exclude)), actual = observedValues)
[1] 0.4413931
accuracy(f=lm(observedValues ~ forecastedValues))
ME RMSE MAE MPE MAPE MASE
Training set 2.014282e-17 0.4388396 0.3488355 -20.86792 203.6389 0.4413931
I have two questions:
- Is my function correct for calculating MASE with cross-sectional data, according to Hyndman's formula?
- Is there a simpler way to calculate MASE with cross-sectional data using the
accuracy()function without needing to write a function?