i am not getting the same results by following Hilbe's, J. 2011 procedure (herein referred to as "the book") on page 20. The procedure is for computing Poisson regression with robust standard errors using the titanic data set in glm R. Hilbe's source code is in Table 2.4 according to the following link: Negative Binomial Regression Second edition Errata 2012
I believe Some changes to the titanic dataset has occurred since it was published here is the procedure and Stata results from the book and what was performed, which resulted in incorrect or different results in R as of july 2021:
library(COUNT)
data("titanic")
attach(titanic)
library(gmodels)
str(titanic)
'data.frame': 1316 obs. of 4 variables:
$ class : Factor w/ 3 levels "3rd class","1st class",..: 2 2 2 2 2 2 2 2 2 2 ...
$ age : Factor w/ 2 levels "child","adults": 2 2 2 2 2 2 2 2 2 2 ...
..- attr(*, "label")= chr "0=child; 1=adult"
..- attr(*, "format")= chr "%10.0g"
..- attr(*, "value.label.table")= Named int 0 1
.. ..- attr(*, "names")= chr "child" "adults"
$ sex : Factor w/ 2 levels "women","man": 2 2 2 2 2 2 2 2 2 2 ...
..- attr(*, "label")= chr "gender: 0=female; 1=male"
..- attr(*, "format")= chr "%8.0g"
..- attr(*, "value.label.table")= Named int 0 1
.. ..- attr(*, "names")= chr "women" "man"
$ survived: num 2 2 2 2 2 2 2 2 2 2 ...
- attr(*, "stata.info")=List of 5
..$ datalabel : chr "Hilbe, Modeling Count Data (CUP, 2014)"
..$ version : int 12
..$ time.stamp : chr "14 Jul 2014 15:12"
..$ val.labels : chr "class" "age" "sex" "survived"
..$ label.table:List of 4
.. ..$ class : Named int 1 2 3 4
.. .. ..- attr(*, "names")= chr "1st class" "2nd class" "3rd class" "crew"
.. ..$ age : Named int 0 1
.. .. ..- attr(*, "names")= chr "child" "adults"
.. ..$ sex : Named int 0 1
.. .. ..- attr(*, "names")= chr "women" "man"
.. ..$ survived: Named int 0 1
.. .. ..- attr(*, "names")= chr "no" "yes"
the book relevels the class.
titanic$class <- relevel(factor(titanic$class), ref=3)
however, as of 2021 "survive" has become a factor opposed to what i believe used to be a binary 0="no" and 1="yes" integers, therefore, the survive is recoded accordingly
titanic$survived <- as.character(titanic$survived)
titanic$survived [which(titanic$survived =="no")] <- "0"
titanic$survived [which(titanic$survived =="yes")] <- "1"
titanic$survived <- as.integer(titanic$survived)
Code from the 2012 errata:
tit3 <- glm(survived ~ factor(class), family=poisson, data=titanic)
irr <- exp(coef(tit3)) # vector of IRRs
library("sandwich")
rse <- sqrt(diag(vcovHC(tit3, type="HC0"))) # coef robust SEs
irr*rse # IRR robust SEs
irr*rse output in R console
(Intercept) factor(class)1st class factor(class)2nd class
0.01634255 0.19270871 0.15723303
using summary function
> summary(tit3)
Call:
glm(formula = survived ~ factor(class), family = poisson, data = titanic)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.1177 -0.7101 -0.7101 0.4364 1.1225
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.37783 0.07495 -18.384 < 2e-16 ***
factor(class)1st class 0.90721 0.10268 8.835 < 2e-16 ***
factor(class)2nd class 0.49603 0.11871 4.179 2.93e-05 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for poisson family taken to be 1)
Null deviance: 967.81 on 1315 degrees of freedom
Residual deviance: 889.69 on 1313 degrees of freedom
AIC: 1893.7
Number of Fisher Scoring iterations: 5
The following is believed to be the correct estimates because its what is n the book. The Incident Rate Ratio (IRR)is :
class2: 1.642184
class1: 2.477407
and estimates and Robust Std. Err.
class2: 0.1572928
class1: 0.192782
all have P>|z| == 0. Can someone confirm this? Thank you