I have a data set I am trying to fit with bam() in the mgcv package. The model has a binary outcome and I need to specify random intercepts for each animal ID. A subset of the data is below (my actual data is much larger with many more covariates):
dat2 <- read.csv('https://github.com/silasbergen/example_data/raw/main/dat2.csv')
dat2$Animal_id <- factor(dat2$Animal_id)
> head(dat2)
Animal_id DEM_IA Anyrisk
1 105 279.94 0
2 105 278.68 0
3 106 329.13 0
4 106 329.93 0
5 106 332.25 0
6 106 333.52 0
> summary(dat2)
Animal_id DEM_IA Anyrisk
105: 2 Min. :156.3 Min. :0.0000
106: 83252 1st Qu.:246.8 1st Qu.:0.0000
107: 22657 Median :290.1 Median :0.0000
108:104873 Mean :284.8 Mean :0.3619
109:142897 3rd Qu.:318.0 3rd Qu.:1.0000
110: 53967 Max. :411.8 Max. :1.0000
I want to fit the model and predict for new data without the random effect:
library(mgcv)
mod <- bam(Anyrisk ~s(Animal_id,bs="re") + s(DEM_IA), data = dat2, family = "binomial",discrete=TRUE)
topred <- data.frame(DEM_IA = c(280,320))
predict(mod, newdata = topred, exclude="s(Animal_id)",newdata.guaranteed = TRUE)
But this throws an error:
Error in eval(predvars, data, env) : object 'Animal_id' not found
Why would it need Animal_id
when I am specifically telling it to exclude this term from the prediction? This is also especially strange as I can run the similar examples in the ?random.effects
mgcv
help file, no problem, even if I modify those examples to use bam() instead of gam()! Any help would be greatly appreciated!
EDIT
I may have found a fix; apparently if using discrete=TRUE
in the bam()
model, then predict.bam()
also uses discrete=TRUE
which will not work with missing random effect, but this works:
mod<- bam(Anyrisk ~s(Animal_id,bs="re") + s(DEM_IA), data = dat2, family = "binomial",discrete=TRUE)
topred <- data.frame(DEM_IA = c(280,320))
predict(mod,topred, exclude="s(Animal_id)",newdata.guaranteed = TRUE,discrete=FALSE)
Output:
1 2
-0.4451066 -0.0285989
newdata.guaraneteed
unless you are absolutely certain you have creatednewdata
correctly. While you can omit a term if it's not needed because ofexclude
, and Simon suggests this as one way to ignore ranefs, as you've found this doesn't work all the time with every model. Providing something for all terms in the model will work everywhere. – Gavin Simpsondiscrete=TRUE
is used in thepredict()
function, random effects are not ignored even ifexclude =
is specified. Note:> topred <- data.frame(DEM_IA = c(280,280),Animal_id = dat2$Animal_id[c(1,3)]) > predict(mod,topred, exclude="s(Animal_id)",discrete=TRUE) 1 2 -0.8157566 -0.7150832 > predict(mod,topred, exclude="s(Animal_id)",discrete=FALSE) 1 2 -0.4451066 -0.4451066
– Silas Bergen