2
votes

I have a lme4 model I have run for a hierarchical logistic regression, and I'm plotting the effects using the effects package. I would like to create an effects graph with the standard error of the mean as the error bars. I can get the point estimates, 95% confidence intervals, and standard errors into a dataframe. The standard errors, however, seem at odds with the confidence limit parameters, see below for an example in a regular glm.

library(effects)
library(dplyr)
mtcars <- mtcars %>%
mutate(vs = factor(vs))
glm1 <- glm(am ~ vs, mtcars, family = "binomial")
(glm1_eff <- Effect("vs", glm1) %>%
 as.data.frame())

  vs       fit        se     lower     upper
1  0 0.3333333 0.4999999 0.1580074 0.5712210
2  1 0.5000000 0.5345225 0.2596776 0.7403224

My understanding is that the fit column displays the point estimate for the probability of am is equal to 1 and that lower and upper correspond to the 95% confidence intervals for the probability that am equals 1. Note that the standard error does not seem to correspond to the confidence interval (e.g., .33+.49 > .57).

Here's what I am shooting for. As opposed to a 95% confidence interval, I would like to have an effects plot with +- the standard error of the mean.

Are the standard errors in log-odds instead of probability? Is there a simply way to convert them to probabilities and plot them so that I can make the graph?

2
Can you tell us where the lme4 package figures into this? (I do not see any "hierarchy.)IRTFM
I do agree this doesn't make sense on a statistical basis. I would contact the maintainer, John Fox. You get his email with maintainer("effects"). (he's not necessarily the author.) I think the default for a 95% CI should be est +/- 1.96 * se. If it's something else, there should be an explanation. I don't see any such explanation. Looks like a brain-fart occurred during construction of the as.data.frame.eff-function. It's probably not a very commonly used function.IRTFM

2 Answers

2
votes

John Fox shared this helpful response:

From ?Effect: "se: (for "eff" objects) a vector of standard errors for the effect, on the scale of the linear predictor." So the standard errors are on the log-odds scale." You could use the delta method to get standard errors on the probability scale but that would be very ill-advised, since the approach to asymptotic normality of estimated probabilities will be much slower than of log-odds. Effect() computes confidence limits on the scale of the linear predictor (log-odds for a logit model) and then inverse-transforms them to the scale of the response (probabilities).

All of the information you need to create a custom plot is in the "eff" object returned by Effect(); the contents of the object are documented in ?Effect.

I agree, by the way, that the as.data.frame.eff() method could be improved, and I'll do that when I have a chance. In particular, it invites misunderstanding to report the effects and confidence limits on the scale of the response but to show standard errors for the linear-predictor scale.

0
votes

I'm answering the mystery first, then addressing the "show SE on the plot" question

  1. Explanation of the SE mystery: All math in a GLM needs to be done on the link scale because this is the additive scale (where stuff can be added up). So...

The values in the column "fit" are the predicted probability of success (or the "predictions on the response scale"). Their values are expit(b0) and expit(b0 + b1). expit() is the inverse logit function. The SEs are on the link scale. An SE on the response scale doesn't make much sense because the response scale is non-linear (although its kinda weird to have stats on the response and link scale in the same table). "lower" and "upper" are on the response scale, so these are the CIs of the predicted probabilities of success. They are computed as expit(b0 ± 1.96SE) and expit(b0 + b1 ± 1.96SE). To recover these values with what is given

library(boot) # inv.logit and logit functions

expit.pred_0 <- 1/3 # fit 0
expit.pred_1 <- 1/2 # fit 1
se1 <- 1/2
se2 <- .5345225

inv.logit(logit(expit.pred_0) - qnorm(.975)*se1)
inv.logit(logit(expit.pred_0) + qnorm(.975)*se1)
inv.logit(logit(expit.pred_1) - qnorm(.975)*se2)
inv.logit(logit(expit.pred_1) + qnorm(.975)*se2)
> inv.logit(logit(expit.pred_0) - qnorm(.975)*se1)
[1] 0.1580074
> inv.logit(logit(expit.pred_0) + qnorm(.975)*se1)
[1] 0.5712211
> inv.logit(logit(expit.pred_1) - qnorm(.975)*se2)
[1] 0.2596776
> inv.logit(logit(expit.pred_1) + qnorm(.975)*se2)
[1] 0.7403224
  1. Showing an SE computed from a glm on the response (non additive) scale doesn't make any sense because the SE is only additive on the link scale. In other words Multiplying SE by some quantile on the response scale (the scale of the plot you envision, with probability on the y axis) is meaningless. A CI is a point estimate back transformed from the link scale and so makes sense for plotting.

I frequently see researchers plotting SE bars computed from a linear model, like you envision, even though the statistics presented are from a GLM. These SE's are meaningful in a sense I guess but they often imply absurd consequences (like probabilities that could be less than zero or greater than one) so...don't do that either.