4
votes

I'm having trouble emulating how stat_smooth calculates it's confidence interval.

Let's generate some data and a simple model:

library(tidyverse)    
# sample data
df = tibble(
  x = runif(10),
  y = x + rnorm(10)*0.2
)

# simple linear model
model = lm(y ~ x, df)

Now use predict() to generate values and confidence intervals

# predict 
df$predicted = predict(
  object = model,
  newdata = df
)

# predict 95% confidence interval
df$CI = predict(
  object = model,
  newdata = df,
  se.fit = TRUE
)$se.fit * qnorm(1 - (1-0.95)/2)

Notice that qnorm is used to expand from standard error to 95% CI

Plot the data (black dots), geom_smooth (black line + gray ribbon), and the predicted ribbon (red and blue lines).

ggplot(df) +
  aes(x = x, y = y) +
  geom_point(size = 2) +
  geom_smooth(method = "lm", level = 0.95, fullrange = TRUE, color = "black") +
  geom_line(aes(y = predicted + CI), color = "blue") + # upper
  geom_line(aes(y = predicted - CI), color = "red") + # lower
  theme_classic()

enter image description here

The red and blue lines should be the same as the ribbon's edges. What am I doing wrong?

1
It's probably use confidence intervals based on the t-distribution and not the normal distribution...Dason
Thank you! I replaced qnorm(...) with qt(1 - (1-0.95)/2, nrow(df)), and it perfectly lines up. If you post as an answer, I'll mark it correct.sharoz

1 Answers

4
votes

As posted in a comment by @Dason, the answer is that geom_smooth uses a t-distribution, not a normal distribution.

In my original question, replace qnorm(1 - (1-0.95)/2) with qt(1 - (1-0.95)/2, nrow(df)) for the lines to match up.