Manually set coefficient for new factor level when predicting

Question

I have a linear model where one of the independent variables is a factor and where I am trying to make predictions on a data set that contains a new factor level (a factor level that wasn't in the data set the model was estimated on). I want to be able to make predictions for the observations with the new factor level by manually specifying the coefficient that will be applied to the factor. For example, suppose I estimate daily sales volumes for three types of stores, and I introduce a fourth type of store into the dataset. I have no historical data for it, but I might assume it will behave like some weighted combination of the other stores, for whom I have model coefficients.

If I try to apply predict.lm() to the new data I will get an error telling me that the factor has new levels (this makes sense).

df <- data.frame(y=rnorm(100), x1=factor(rep(1:4,25)))
lm1 <- lm(y ~ x1, data=df)
newdata <- data.frame(y=rnorm(100), x1=factor(rep(1:5,20)))
predict(lm1, newdata)

Error in model.frame.default(Terms, newdata, na.action = na.action, xlev = object$xlevels) : 
  factor x2 has new levels 5

I could do the prediction manually by simply multiplying the coefficients by the individual columns in the data.frame. However, this is cumbersome given that the real model I'm working with has many variables and interaction terms, and I want to be able to easily cycle through various model specifications by changing the model formula. Is there a way for me to essentially add a new coefficient to a model object and then use it to make forecasts? If not, is there another approach that is less cumbersome than setting up the entire prediction step manually?

?update may show you how to manipulate a formula programatically without recourse to using strings — dardisco
A bit more detail on how you want to predict for your new level would be good. "Some weighted combination" isn't very precise. — Hong Ooi
If you would like to try this on many models and with different coefficient values for your additional factor level you could write a function to do this. I would try to extract the model.matrix and coefficients from the lm object, insert the factor level and coefficient and then use matrix multiplication to obtain the predictions. — Edwin

Neal Fultz Neal Fultz · Accepted Answer · 2013-10-23T00:09:43

Assumming you want level 5 to be evenly weighted, you can convert to a matrix, plug in the 25%, and multiply it by the coefficients from the model...

n.mat <- model.matrix(~x1, data=newdata)
n.mat[n.mat[,5] == 1, 2:4] <- .25
n.mat <- n.mat[,-5]
n.prediction <- n.mat %*% coef(lm1)

Manually set coefficient for new factor level when predicting

2 Answers