I'm having trouble emulating how stat_smooth
calculates it's confidence interval.
Let's generate some data and a simple model:
library(tidyverse)
# sample data
df = tibble(
x = runif(10),
y = x + rnorm(10)*0.2
)
# simple linear model
model = lm(y ~ x, df)
Now use predict()
to generate values and confidence intervals
# predict
df$predicted = predict(
object = model,
newdata = df
)
# predict 95% confidence interval
df$CI = predict(
object = model,
newdata = df,
se.fit = TRUE
)$se.fit * qnorm(1 - (1-0.95)/2)
Notice that qnorm is used to expand from standard error to 95% CI
Plot the data (black dots), geom_smooth (black line + gray ribbon), and the predicted ribbon (red and blue lines).
ggplot(df) +
aes(x = x, y = y) +
geom_point(size = 2) +
geom_smooth(method = "lm", level = 0.95, fullrange = TRUE, color = "black") +
geom_line(aes(y = predicted + CI), color = "blue") + # upper
geom_line(aes(y = predicted - CI), color = "red") + # lower
theme_classic()
The red and blue lines should be the same as the ribbon's edges. What am I doing wrong?
qnorm(...)
withqt(1 - (1-0.95)/2, nrow(df))
, and it perfectly lines up. If you post as an answer, I'll mark it correct. – sharoz