1
votes

In a zero-inflated GAM (ziplss), I'm getting a warning when 1) I use new data and 2) the count portion has categorical variables that are NOT in the zero-inflation portion. There's a warning for every categorical variable not represented in the zero-inflation part.

Here's a reproducible example:

library(mgcv)
library(glmmTMB)
data(Salamanders)   
Salamanders$x <- rnorm(nrow(Salamanders), 0, 10)

zipgam <- gam(list(count ~ spp * mined + s(x) + s(site, bs = "re"),
                ~ spp),
           data = Salamanders, family = ziplss, method = "REML")

preds.response <- data.frame(Predict = predict(zipgam, type = "response"))

nd <- data.frame(x = 0, spp = "GP", mined = "yes", site = Salamanders$site[1])      
nd$pred <- predict(zipgam, newdata = nd, exclude="site")

I haven't seen this mentioned anywhere, which is odd and tells me that I'm likely doing something wrong (otherwise this would be available in search results). Would appreciate any insight.

1
What is Salamander? This isn't reproducible if we don't have that object.Gavin Simpson
@GavinSimpson. Sorry - was just using the Salamanders data from your own blog posts and forgot it lives in the glmmTMB package. Question edited and is now fully reproducible.user2602640
Well, that's embarassing :-) Thanks for the edit. I'll take a look now.Gavin Simpson

1 Answers

2
votes

I think this is just an infelicity in the implementation. The warning I am seeing is:

Warning message:
In model.matrix.default(Terms[[i]], mf, contrasts = object$contrasts) :
  variable 'mined' is absent, its contrast will be ignored

This is harmless (at least in this case; I haven't checked other cases) and is generated because there is only a single object$contrasts, and it contains details about mined but this variable is not present in the second linear predictor so R warns that it is going to ignore the contrasts for the mined variable, but this only happens when building the model matrix for the zero-inflation part of the model. The count part correctly uses the mined variable and the correct contrasts.

You could argue that having $contrasts be a list, one per linear predictor would be a better design and then the model matrix would be created using:

model.matrix.default(Terms[[i]], mf, contrasts = object$contrasts[[i]])

but I have no idea if this would break everything else in mgcv.

Currently $contrasts for this model is just:

> zipgam$contrasts
$spp
[1] "contr.treatment"

$mined
[1] "contr.treatment"

$spp
[1] "contr.treatment"

which already shows some redundancy.