I have a linear model where one of the independent variables is a factor and where I am trying to make predictions on a data set that contains a new factor level (a factor level that wasn't in the data set the model was estimated on). I want to be able to make predictions for the observations with the new factor level by manually specifying the coefficient that will be applied to the factor. For example, suppose I estimate daily sales volumes for three types of stores, and I introduce a fourth type of store into the dataset. I have no historical data for it, but I might assume it will behave like some weighted combination of the other stores, for whom I have model coefficients.
If I try to apply predict.lm() to the new data I will get an error telling me that the factor has new levels (this makes sense).
df <- data.frame(y=rnorm(100), x1=factor(rep(1:4,25)))
lm1 <- lm(y ~ x1, data=df)
newdata <- data.frame(y=rnorm(100), x1=factor(rep(1:5,20)))
predict(lm1, newdata)
Error in model.frame.default(Terms, newdata, na.action = na.action, xlev = object$xlevels) :
factor x2 has new levels 5
I could do the prediction manually by simply multiplying the coefficients by the individual columns in the data.frame. However, this is cumbersome given that the real model I'm working with has many variables and interaction terms, and I want to be able to easily cycle through various model specifications by changing the model formula. Is there a way for me to essentially add a new coefficient to a model object and then use it to make forecasts? If not, is there another approach that is less cumbersome than setting up the entire prediction step manually?
?updatemay show you how to manipulate a formula programatically without recourse to using strings - dardiscomodel.matrixandcoefficientsfrom the lm object, insert the factor level and coefficient and then use matrix multiplication to obtain the predictions. - Edwin