1
votes

From Clinical Prediction Models by Ewout W. Steyerberg we have the following:

A calibration plot has predictions on the x axis, and the outcome on the y axis. A line of identity helps for orientation: Perfect predictions should be on the 45° line. For linear regression, the calibration plot results in a simple scatter plot. For binary outcomes, the plot contains only 0 and 1 values for the y axis. Probabilities are not observed directly. However, smoothing techniques can be used to estimate the observed probabilities of the outcome ( p ( y = 1)) in relation to the predicted probabilities. The observed 0/1 outcomes are replaced by values between 0 and 1 by combining outcome values of subjects with similar predicted probabilities, e.g. using the loess algorithm.

I'm fitting a logistic regression model with a binary outcome. Below is an example code. The calibration curve is going to look weird because the sample is so small. I'm mostly wondering if the methodology is correct.

library(tidyverse)

tibble_ex <- tibble(
  event = c(1, 0, 1, 0, 0, 1),
  weight = c(100, 200, 110, 210, 220, 105)
)

model <- glm(event ~ weight, family = 'binomial', data = tibble_ex) 

tibble_ex <- tibble_ex %>%
  mutate(pred = predict(model, type = 'response'))

tibble_ex %>%
  arrange(pred) %>%
  ggplot(aes(x = pred, y = event)) +
  stat_smooth(method = 'glm', method.args = list(family = binomial), se = F) +
  geom_abline()

enter image description here

1

1 Answers

0
votes

You are missing just the smoothing part if the plot. If you want to use glm to plot the curve then you have to use that with splines.

tibble_ex %>%
arrange(pred) %>%
ggplot(aes(x = pred, y = event)) +
scale_y_continuous(limits = c(0, 1), breaks = seq(0, 1, by = 0.2)) +
scale_x_continuous(limits = c(0, 1), breaks = seq(0, 1, by = 0.2)) +
stat_smooth(method = "glm", formula = y ~ ns(x,1), size = 1) +
geom_abline()

However, I have noticed that Steyerberg and Harrell prefer the use of loess smoothing.

tibble_ex %>%
arrange(pred) %>%
ggplot(aes(x = pred, y = event)) +
scale_y_continuous(limits = c(0, 1), breaks = seq(0, 1, by = 0.2)) +
scale_x_continuous(limits = c(0, 1), breaks = seq(0, 1, by = 0.2)) +
geom_smooth(aes(x = pred, y = event), color = "red", se = F, method = "loess") + 
# you can use stat_smooth in place of geom_smooth
geom_abline()

I want to refer also to the rms package of Frank Harrell. There are many helpful functions to fit and validate models including calibration plots. The code below plots the calibration curve and provide other statistics.

library(rms)
val.prob(fitted(model),tibble_ex$event)