zero-inflated GAM prediction with newdata

Question

In a zero-inflated GAM (ziplss), I'm getting a warning when 1) I use new data and 2) the count portion has categorical variables that are NOT in the zero-inflation portion. There's a warning for every categorical variable not represented in the zero-inflation part.

Here's a reproducible example:

library(mgcv)
library(glmmTMB)
data(Salamanders)   
Salamanders$x <- rnorm(nrow(Salamanders), 0, 10)

zipgam <- gam(list(count ~ spp * mined + s(x) + s(site, bs = "re"),
                ~ spp),
           data = Salamanders, family = ziplss, method = "REML")

preds.response <- data.frame(Predict = predict(zipgam, type = "response"))

nd <- data.frame(x = 0, spp = "GP", mined = "yes", site = Salamanders$site[1])      
nd$pred <- predict(zipgam, newdata = nd, exclude="site")

I haven't seen this mentioned anywhere, which is odd and tells me that I'm likely doing something wrong (otherwise this would be available in search results). Would appreciate any insight.

What is Salamander? This isn't reproducible if we don't have that object. — Gavin Simpson
@GavinSimpson. Sorry - was just using the Salamanders data from your own blog posts and forgot it lives in the glmmTMB package. Question edited and is now fully reproducible. — user2602640
Well, that's embarassing :-) Thanks for the edit. I'll take a look now. — Gavin Simpson

Gavin Simpson Gavin Simpson · Accepted Answer · 2018-11-02T16:45:52

I think this is just an infelicity in the implementation. The warning I am seeing is:

Warning message:
In model.matrix.default(Terms[[i]], mf, contrasts = object$contrasts) :
  variable 'mined' is absent, its contrast will be ignored

This is harmless (at least in this case; I haven't checked other cases) and is generated because there is only a single object$contrasts, and it contains details about mined but this variable is not present in the second linear predictor so R warns that it is going to ignore the contrasts for the mined variable, but this only happens when building the model matrix for the zero-inflation part of the model. The count part correctly uses the mined variable and the correct contrasts.

You could argue that having $contrasts be a list, one per linear predictor would be a better design and then the model matrix would be created using:

model.matrix.default(Terms[[i]], mf, contrasts = object$contrasts[[i]])

but I have no idea if this would break everything else in mgcv.

Currently $contrasts for this model is just:

> zipgam$contrasts
$spp
[1] "contr.treatment"

$mined
[1] "contr.treatment"

$spp
[1] "contr.treatment"

which already shows some redundancy.

zero-inflated GAM prediction with newdata

1 Answers