0
votes

How do you predict in mgcv::gam when you've fitted a model that might contain random effects?

The other thread on this site with the "exclude" trick does not work for me (https://stats.stackexchange.com/questions/131106/predicting-with-random-effects-in-mgcv-gam)


ya <- rnorm(100, 0, 1)
yb <- rnorm(100,0,1.5)
yc <- rnorm(100, 0, 2)
yd <- rnorm(100, 0, 2.5)

yy <- c(ya,yb,yc,yd) #so, now we've got data from 4 different groups. 
xx <- c(rep("a", 100), rep("b",100), rep("c",100),rep("d",100)) #groups
zz <- rnorm(400,0,1) #some other covariate

model <- gam(yy ~ zz + s(xx, bs = "re")) #the model

predictdata <- data.frame( zz = 5 )   #new data
predict(model, newdata = predictdata, exclude = "s(xx)")   #prediction

and this produces the error

Error in model.frame.default(ff, data = newdata, na.action = na.act) : 
  variable lengths differ (found for 'xx')
In addition: Warning messages:
1: In predict.gam(model, newdata = predictdata, exclude = "s(xx)") :
  not all required variables have been supplied in  newdata!

2: 'newdata' had 1 row but variables found have 400 rows 

My mgcv package is the latest.

EDIT:

If you change predictdata to

predictdata <- data.frame(zz = 5, xx = "f")

then it says

Error in predict.gam(model, newdata = predictdata, exclude = "s(xx)") : 
  f not in original fit
1

1 Answers

1
votes

I experimented with your example and it seems that the 'exclude' statement does work, even though you have to specify in newdata values for the random effects that were included in the original dataset used to fit the model. This however, makes me a bit uneasy. Another caveat is that 'exclude' did not seem to work on a model with a variance structure that was estimated separately by group (I tried this with another dataset), i.e., something like s(xx, s="re", by=group). You might want to post or have the question moved to crossvalidated so that other statisticians/analysts can see it an perhaps provide a better answer.

Below is my code. Note that I changed the means for groups a and d, yet the overall mean should be around zero.

ya <- rnorm(100, 1, 1)
yb <- rnorm(100, 0,1.5)
yc <- rnorm(100, 0, 2)
yd <- rnorm(100, -1, 2.5)

yy <- c(ya,yb,yc,yd) #so, now we've got data from 4 different groups. 
xx <- c(rep("a", 100), rep("b",100), rep("c",100),rep("d",100)) #groups
zz <- rnorm(400,0,1) #some other covariate

some.data= data.frame(yy,xx,zz)
model <- gam(yy ~ zz + s(xx, bs = "re"),data=some.data) #the model


# the intercept is the overall mean when zz is zero
summary(model)

 predictdata <- data.frame(zz = c(0,0,0,0), xx =c("a","b","c","d"))  #new data

#excluding random effects. Estimate should be the same for all and should be the intercept  
predict(model, newdata = predictdata, exclude = "s(xx)") 

#including random effects. Estimates should differ by group with 'a' larger and 'd' smaller
predict(model, newdata = predictdata)