3
votes

I have a data set that is modelled as zero-inflated negative binomial with mixed effects. I want to get the confidence intervals from the model predictions and plot the models mean and confidence intervals. I have attempted to plot the model mean. Can someone let me know if this is the correct way to do it? I do not know how to plot the confidence intervals from the models onto ggplot2. I would like to plot the predicted mean of the data along with it's confidence interval. My basic attempt at the graph has been made below in the code.

library(pscl)
library(lmtest)

df <- data.frame(
      fertilizer = c("N","N","N","N","N","N","N","N","N","N","N","N","P","P","P","P","P","P","P","P","P","P","P","P","N","N","N","N","N","N","N","N","N","N","N","N","P","P","P","P","P","P","P","P","P","P","P","P"), 
      level = c("low","low","high","high","low","low","high","high","low","low","high","high","low","low","high","high","low","low","high","high","low","low","high","high","low","low","high","high","low","low","high","high","low","low","high","high","low","low","high","high","low","low","high","high","low","low","high","low"), 
      repro = c(0,90,2,4,0,80,1,90,2,33,56,0,99,100,66,80,1,0,2,33,0,0,1,2,90,5,2,2,5,8,0,1,90,2,4,66,0,0,0,0,1,2,90,5,2,5,8,55)
    )    

    model <- formula(repro ∼ fertilizer + level | fertilizer * level)
    modelzinb <- zeroinfl(model, dist = "negbin", link = "logit",data =df)
    summary(modelzinb)

    df$predict <- predict(modelzinb)

    ggplot(df, aes(x=fertilizer, y=predict, color = fertilizer)) + theme_bw() +  stat_summary(aes(color = fertilizer),fun.y = mean, geom = "point", size = 4, position = position_dodge(0.1)) +
      scale_x_discrete("Fertlizer") +
      facet_wrap(.~level)  
1
Where is zeroinfl from? Always include any non-base R package you've been using.Maurits Evers
Sorry about that. I have included it now.Biotechgeek

1 Answers

3
votes

I am not sure what you'd like to plot. But I can show you how to calculate prediction confidence intervals for your zero-inflated negative-binomial model.

Note that I cannot comment on the quality of the fit, or whether fitting such a model to your data makes sense. To answer these kind of questions requires domain-specific knowledge that I don't have.

  1. Let's start by fitting the model to your data.

    library(pscl)
    model <- formula(repro ~ fertilizer + level | fertilizer * level)
    modelzinb <- zeroinfl(model, dist = "negbin", link = "logit", data = df)
    
  2. We can use predict to get model estimates for the response.

    resp <- predict(modelzinb)
    

    Note that in your zero-inflated NB model, predict.zeroinfl (by default) returns the estimated mean response on the scale of the observed counts repro.

    Concerning confidence intervals (CIs), the "issue" here is that precict.zeroinfl does not have an interval argument that allows to calculate CIs directly. On a side note: It seems that pscl::zeroinfl used to include that functionality, see the documentation for version 0.54. Perhaps it would be worth contacting the package maintainer(s) about this.

    The bottom line is, we have to find a different way to calculate prediction CIs. To do so, we use bootstrapping. The R library boot provides all the necessary functions & tools to do that.

  3. To start, we can use the function boot::boot to generate bootstrap replicates of the predicted responses.boot::boot needs a function that generates the quantity of interest (here the predicted response) based on data. So we first define such a function stat. In this particular case stat must take two arguments: the first argument is the original data, the second argument is a vector of row indices (that define the random bootstrap sample of the data). The function will then use bootstrapped data to fit the model, and then use the model to predict the mean response based on the full data.

    stat <- function(df, inds) {
        model <- formula(repro ~ fertilizer + level | fertilizer * level)
        predict(
            zeroinfl(model, dist = "negbin", link = "logit", data = df[inds, ]),
            newdata = df)
    }
    

    In order to ensure reproducibility of results, we set a a fixed random seed and generate 100 bootstrap samples for the estimated mean response

    set.seed(2018)
    library(boot)
    res <- boot(df, stat, R = 100)
    

    The output object res is a list that contains the estimated mean response for the full data in element t0 (verify with all.equal(res$t0, predict(modelzinb))) and the estimated mean response for the bootstrap samples in element t (which is a matrix of dimension R x nrow(df), where R are the number of bootstrap samples).

  4. All that's left to do is to calculate a confidence interval from the estimated mean responses based on the models fitted to the bootstrap samples. To do so we can use the handy function boot::boot.ci. The function allows to calculate different types of CIs (basic, normal, BCa, etc.). Here we use "basic" for demonstration purposes. I do not claim that this is the best choice.

    boot::boot.ci takes an index argument which corresponds to the entry of the estimated mean response vector. The actual intervals are stored in the last 2 columns (column 4 and 5) of the matrix stored as a list element of the boot.ci return object.

    CI <- setNames(as.data.frame(t(sapply(1:nrow(df), function(row)
        boot.ci(res, conf = 0.95, type = "basic", index = row)$basic[, 4:5]))),
        c("lower", "upper"))
    
  5. Now we're done, and we can column-bind the CIs to the original data including predicted mean response values.

    df_all <- cbind.data.frame(df, response = predict(modelzinb), CI)
    #   fertilizer level repro response      lower    upper
    #1           N   low     0 29.72057   5.876731 46.48165
    #2           N   low    90 29.72057   5.876731 46.48165
    #3           N  high     2 21.99345 -15.228956 38.86421
    #4           N  high     4 21.99345 -15.228956 38.86421
    #5           N   low     0 29.72057   5.876731 46.48165
    #6           N   low    80 29.72057   5.876731 46.48165
    #7           N  high     1 21.99345 -15.228956 38.86421
    #8           N  high    90 21.99345 -15.228956 38.86421
    #9           N   low     2 29.72057   5.876731 46.48165
    #10          N   low    33 29.72057   5.876731 46.48165
    #11          N  high    56 21.99345 -15.228956 38.86421
    #12          N  high     0 21.99345 -15.228956 38.86421
    #13          P   low    99 24.22626  -9.225827 46.17656
    #14          P   low   100 24.22626  -9.225827 46.17656
    #15          P  high    66 22.71826   2.595246 39.88333
    #16          P  high    80 22.71826   2.595246 39.88333
    #17          P   low     1 24.22626  -9.225827 46.17656
    #18          P   low     0 24.22626  -9.225827 46.17656
    #19          P  high     2 22.71826   2.595246 39.88333
    #20          P  high    33 22.71826   2.595246 39.88333
    #21          P   low     0 24.22626  -9.225827 46.17656
    #22          P   low     0 24.22626  -9.225827 46.17656
    #23          P  high     1 22.71826   2.595246 39.88333
    #24          P  high     2 22.71826   2.595246 39.88333
    #25          N   low    90 29.72057   5.876731 46.48165
    #26          N   low     5 29.72057   5.876731 46.48165
    #27          N  high     2 21.99345 -15.228956 38.86421
    #28          N  high     2 21.99345 -15.228956 38.86421
    #29          N   low     5 29.72057   5.876731 46.48165
    #30          N   low     8 29.72057   5.876731 46.48165
    #31          N  high     0 21.99345 -15.228956 38.86421
    #32          N  high     1 21.99345 -15.228956 38.86421
    #33          N   low    90 29.72057   5.876731 46.48165
    #34          N   low     2 29.72057   5.876731 46.48165
    #35          N  high     4 21.99345 -15.228956 38.86421
    #36          N  high    66 21.99345 -15.228956 38.86421
    #37          P   low     0 24.22626  -9.225827 46.17656
    #38          P   low     0 24.22626  -9.225827 46.17656
    #39          P  high     0 22.71826   2.595246 39.88333
    #40          P  high     0 22.71826   2.595246 39.88333
    #41          P   low     1 24.22626  -9.225827 46.17656
    #42          P   low     2 24.22626  -9.225827 46.17656
    #43          P  high    90 22.71826   2.595246 39.88333
    #44          P  high     5 22.71826   2.595246 39.88333
    #45          P   low     2 24.22626  -9.225827 46.17656
    #46          P   low     5 24.22626  -9.225827 46.17656
    #47          P  high     8 22.71826   2.595246 39.88333
    #48          P   low    55 24.22626  -9.225827 46.17656
    #...
    

Sample data

df <- data.frame(
    fertilizer = c("N","N","N","N","N","N","N","N","N","N","N","N","P","P","P","P","P","P","P","P","P","P","P","P","N","N","N","N","N","N","N","N","N","N","N","N","P","P","P","P","P","P","P","P","P","P","P","P"),
    level = c("low","low","high","high","low","low","high","high","low","low","high","high","low","low","high","high","low","low","high","high","low","low","high","high","low","low","high","high","low","low","high","high","low","low","high","high","low","low","high","high","low","low","high","high","low","low","high","low"),
    repro = c(0,90,2,4,0,80,1,90,2,33,56,0,99,100,66,80,1,0,2,33,0,0,1,2,90,5,2,2,5,8,0,1,90,2,4,66,0,0,0,0,1,2,90,5,2,5,8,55))