3
votes

I am trying to use bam to run the following generalized additive model:

m <- bam(result ~ factor(city) + factor(year) + lnpopulation + s(lnincome), data=full_df, na.action=na.omit, family=ziP(theta = NULL, link = "identity",b=0))

But getting the following error:

Error in bam(result : extended families not supported by bam

The documentation for bam mentions the following:

This is a family object specifying the distribution and link to use in fitting etc. See glm and family for more details. The extended families listed in family.mgcv can also be used.

The family.mgcv does include ziP. What am I doing wrong? Any guidance will be appreciated. Thank you!

Re-posting from r-help.

Sincerely,

Milu

1

1 Answers

1
votes

The error message indicates that the version of mgcv you are using does not support these extended families that are otherwise supported by mgcv's gam() function.

Thankfully, the solution to your problem is trivial as Simon Wood has now implemented this feature (extended families within bam()) from version 1.8-19 as noted in the ChangeLog:

1.8-19

** bam() now accepts extended families (i.e. nb, tw, ocat etc)

The current version is 1.8-22 which fixes some bugs related to the functionality you are looking for, so make sure you update to the latest version.

Here is an example modified from ?ziP

 ## function to simulated zip data

 rzip <- function(gamma,theta= c(-2,.3)) {
 ## generate zero inflated Poisson random variables, where 
 ## lambda = exp(gamma), eta = theta[1] + exp(theta[2])*gamma
 ## and 1-p = exp(-exp(eta)).
   y <- gamma; n <- length(y)
   lambda <- exp(gamma)
   eta <- theta[1] + exp(theta[2])*gamma
   p <- 1- exp(-exp(eta))
   ind <- p > runif(n)
   y[!ind] <- 0
   np <- sum(ind)
   ## generate from zero truncated Poisson, given presence...
   y[ind] <- qpois(runif(np,dpois(0,lambda[ind]),1),lambda[ind])
   y
 } 

 library('mgcv')

 ## Simulate some ziP data...
 set.seed(1);n<-400
 dat <- gamSim(1,n=n)
 dat$y <- rzip(dat$f/4-1)

 b <- bam(y ~ s(x0) + s(x1) + s(x2) + s(x3),
          family = ziP(), data = dat)

Which gives for me, the following fitted model:

> summary(b)

Family: Zero inflated Poisson(-1.855,1.244) 
Link function: identity 

Formula:
y ~ s(x0) + s(x1) + s(x2) + s(x3)

Parametric coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  0.97426    0.04988   19.53   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Approximate significance of smooth terms:
        edf Ref.df      F p-value    
s(x0) 2.396  2.989  2.336  0.0759 .  
s(x1) 2.784  3.464 77.217  <2e-16 ***
s(x2) 7.397  8.317 59.364  <2e-16 ***
s(x3) 1.235  1.428  0.269  0.5888    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Deviance explained =   69%
fREML = 593.94  Scale est. = 1         n = 400