4
votes

So I'm using R to do logistic regression, but I'm using offsets.

  mylogit <- glm(Y ~  X1 + offset(0.2*X2) + offset(0.4*X3), data = test, family = "binomial")

The output, shows only a single coefficient, the intercept and one of the predictors, X1.

    Coefficients:
    (Intercept)               X1
      0.5250748         0.0157259

My question: How do i get the raw prediction from each observation from this model? More specifically, if I use the predict function, will it include all the features and their coefficients, even though the model coefficients are listed as only containing the intercept and X1?

  prob = predict(mylogit,test,type=c("response"))

Do I have to use the predict function? Does the "mylogit" object contain anything I can compute directly from? (yes I looked at the documentation on glm, still confused).

thank you for your patients.

1

1 Answers

7
votes

I can report the results of some experiments with glm and offset(). It does not appear (at least from this experiment) that your call to predict will give results that take the offset into account. Rather it seems that summary.glm is needed for that purpose. I started with a rather mangled modification of the 1st example in ?glm ( and this would be more pertinent to your concerns if you did provide data, because then we could play around more with the newdata argument that you would need for "test".)

counts <- c(18,17,15,20,10,20,25,13,12)
outcome <- gl(3,1,9)
treatment <- gl(3,3)
print(d.AD <- data.frame(treatment, outcome, counts))
glm.D93 <- glm(counts ~ outcome + treatment + offset(1:9), family = poisson())
glm.D93d <- glm(counts ~ outcome + treatment , family = poisson())

> predict(glm.D93d, type="response")
       1        2        3        4        5        6        7        8        9 
21.00000 13.33333 15.66667 21.00000 13.33333 15.66667 21.00000 13.33333 15.66667 
> predict(glm.D93, type="response")
       1        2        3        4        5        6        7        8        9 
21.00000 13.33333 15.66667 21.00000 13.33333 15.66667 21.00000 13.33333 15.66667 

As far as I can tell the offset is only apparent when comparisons of the estimated coefficients are made to the NULL estimate (usually 0) for the purposes of statistical inference. That is done by summary.glm:

> summary(glm.D93)$coef
             Estimate Std. Error    z value      Pr(>|z|)
(Intercept)  2.044522  0.1708987  11.963362  5.527764e-33
outcome2    -1.454255  0.2021708  -7.193203  6.328878e-13
outcome3    -2.292987  0.1927423 -11.896644  1.232021e-32
treatment2  -3.000000  0.2000000 -15.000000  7.341915e-51
treatment3  -6.000000  0.2000000 -30.000000 9.813361e-198
> summary(glm.D93d)$coef
                 Estimate Std. Error       z value     Pr(>|z|)
(Intercept)  3.044522e+00  0.1708987  1.781478e+01 5.426767e-71
outcome2    -4.542553e-01  0.2021708 -2.246889e+00 2.464711e-02
outcome3    -2.929871e-01  0.1927423 -1.520097e+00 1.284865e-01
treatment2   1.337909e-15  0.2000000  6.689547e-15 1.000000e+00
treatment3   1.421085e-15  0.2000000  7.105427e-15 1.000000e+00

The offset is only changing the reference levels (with fairly bizarre changes in this mangled example) while the fitting of $linear.predictors and $fitted to the data is not affected. I didn't see a comment in glm that affects this but there is a comment in ?lm : "Offsets specified by offset will not be included in predictions by predict.lm, whereas those specified by an offset term in the formula will be." I will admit that I got very little insight from reading ?model.offset.