The pscl
package in R
is used to fit, among other models, zero-inflated Poisson models.
I have a data frame on weekly food consumption, which is zero-inflated. I had no issues fitting the zero-inflated Poisson (ZIP) model, which successfully predicts the percentage of zero counts I have in my data. The data contains about 30% zero counts, with just over 4600 observations.
The ZIP model was:
zeroinfl(LightC.home~SEX+AGEDET+Inhabitants, data=FoodAnalysis)
Where LightC
is the weekly consumption count (range 0 - 74), SEX
is respondent sex (factor with 2 levels), AGEDET
is respondent ageband (factor with 12 levels, e.g. "5-9 years"), and Inhabitants
is the number of people in the household (integer, range 1 to 8).
The ZIP model summary is:
Call:
zeroinfl(formula = LightC.home ~ SEX + AGEDET + Inhabitants, data = FoodAnalysis)
Pearson residuals:
Min 1Q Median 3Q Max
-3.3479 -1.2781 -0.5170 0.6342 12.2912
Count model coefficients (poisson with log link):
Estimate Std. Error z value Pr(>|z|)
(Intercept) 2.133213 0.029229 72.983 < 2e-16 ***
SEXFemale -0.079156 0.010284 -7.697 1.39e-14 ***
AGEDET5-9 years -0.038153 0.032745 -1.165 0.243963
AGEDET10-14 years 0.014199 0.032411 0.438 0.661317
AGEDET15-17 years 0.253942 0.035419 7.170 7.52e-13 ***
AGEDET18-24 years 0.148395 0.029089 5.101 3.37e-07 ***
AGEDET25-34 years 0.158506 0.026291 6.029 1.65e-09 ***
AGEDET35-44 years 0.157821 0.026043 6.060 1.36e-09 ***
AGEDET45-54 years 0.307299 0.026153 11.750 < 2e-16 ***
AGEDET55-64 years 0.340590 0.026913 12.655 < 2e-16 ***
AGEDET65-74 years 0.361976 0.027260 13.278 < 2e-16 ***
AGEDET75-84 years 0.251614 0.039866 6.311 2.76e-10 ***
AGEDET85 years or more 0.606829 0.083345 7.281 3.32e-13 ***
Inhabitants 0.014697 0.004302 3.416 0.000635 ***
Zero-inflation model coefficients (binomial with logit link):
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.82861 0.17389 -4.765 1.89e-06 ***
SEXFemale -0.08333 0.07196 -1.158 0.246893
AGEDET5-9 years -0.27280 0.18082 -1.509 0.131390
AGEDET10-14 years -0.32820 0.18390 -1.785 0.074308 .
AGEDET15-17 years -0.06472 0.20589 -0.314 0.753276
AGEDET18-24 years -0.06973 0.16047 -0.435 0.663896
AGEDET25-34 years -0.27054 0.14791 -1.829 0.067385 .
AGEDET35-44 years -0.71412 0.15648 -4.564 5.03e-06 ***
AGEDET45-54 years -0.50510 0.15585 -3.241 0.001191 **
AGEDET55-64 years -0.65281 0.16541 -3.947 7.92e-05 ***
AGEDET65-74 years -1.09276 0.18313 -5.967 2.41e-09 ***
AGEDET75-84 years -1.28092 0.34508 -3.712 0.000206 ***
AGEDET85 years or more -12.75039 279.21031 -0.046 0.963577
Inhabitants 0.01733 0.02872 0.603 0.546296
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Number of iterations in BFGS optimization: 35
Log-likelihood: -2.136e+04 on 28 Df
I used the predprob
command in the pscl
package to construct the vector of probabilities of consumption for the food. That constructed a matrix, one row per observation, with each observation given a probability of consumption for each count between 0 (minimum) and 74 (maximum). There are 165 unique rows in this matrix, which are due to the 165 unique combinations of predictors in the zero-inflated Poisson model.
How can I construct the formula underpinning those 75 probabilities for each combination of predictors (e.g. female, aged 25-34 years, 4 inhabitants)? I don't understand the relationship between the 75 probabilities from a mathematical perspective.
For example, the predictor combination SEX=="Female"
, AGEDET=="25-34 years"
and Inhabitants == 4
, gives these probabilities for each person with that predictor combination in the data (only first 6 of 75 shown):
X0 X1 X2 X3 X4 X5
0.2473285 0.0004504804 0.002183136 0.007053337 0.01709108 0.03313101
How were those probabilities estimated, given that the original zeroinfl
formula looks like it's simply estimating the mean value of consumption for the combination of predictors and not all 75 probabilities?