How to calculate the AUC from a ROC plot without the underlying data?

Question

I am doing a meta-analysis on the performance of certain risk assessment instruments. My goal is to pool the AUC estimates of several validity studies for a particular instrument. However, I came across a few studies that do not provide the AUC estimate itself, but rather only present the ROC curve. In such cases I have used https://apps.automeris.io/wpd/ to get the values corresponding to each data point. However, the problem is that even though I have the values for sensitivity and 1-specificity and can use R to plot the ROC curve myself, I don't know which function to use in order to calculate the Area Under the Curve (AUC). This is due to the fact that all R packages/functions that allow me to calculate the AUC use the underlying data as input. That is, the predictor and the response rather than the values for sensitivity and 1-specificity.

I have read the documentation for the 'pROC' package in R, but did not find anything helpful. I guess I could just integrate the area under the curve of plot using integrate()? The problem with that is, however, that I would not receive the confidence intervals for the AUC (which I need in my meta-analysis).

Here is the data that I generated from one of the ROC-curves (by using https://apps.automeris.io/wpd/):

# data table:
AUC_data_1 <- tibble("1-specificity" = c(-0.0031751800795011,
0.05421559172249585, 0.12174003874893036,0.20579144833428253,
0.3012443157265138, 0.502266554865223, 0.6205366469297053,
0.8417661384716209, 
sensitivity = c(0.002260831241825745, 0.16879823941344285,
0.45899739288954267, 0.5804040305755962, 0.7849062327396981,
0.8634686874873007, 0.9710785309748188, 0.9977448923815709))

# roc curve generated from data:
plot(AUC_data_1)

I would like to calculate the AUC from this ROC-curve. However, since I don't have the underlying data (i.e., response and predictor), I can't use the pROC package in R.

This doesn't really look like a ROC curve. The first 1-specificity value is negative, and it is not anchored at (0, 0) and (1,1). — Calimo
You are right. The numbers are just approximations as they are extracted directly from the graph itself. I should have cleaned them up first before posting them. Sorry for that. — Matthias

Calimo Calimo · Accepted Answer · 2019-04-11T17:43:05

The first thing you need to do is cleanup your data. A ROC curve starts at (0, 0) and ends at (1, 1). If these points are missing from your curve, the AUC will be underestimated. Here is an attempted fix:

AUC_data_1 <- tibble("one.minus.specificity" = c(0,
0.05421559172249585, 0.12174003874893036,0.20579144833428253,
0.3012443157265138, 0.502266554865223, 0.6205366469297053,
0.8417661384716209, 1),
sensitivity = c(0, 0.16879823941344285,
0.45899739288954267, 0.5804040305755962, 0.7849062327396981,
0.8634686874873007, 0.9710785309748188, 0.9977448923815709, 1))

Make sure you understand the quality and reliability of the data you get from this service.

Then as you guesses it's just an integration game. I like the trapz function from pracma which uses the trapezoidal rule:

library(pracma)
trapz(AUC_data_1$one.minus.specificity, AUC_data_1$sensitivity)
[1] 0.6268943

Regarding uncertainties, you should have a look at the work by Obuchowski (1) that expresses the variance of the ROC curve as a function of the AUC, sample size and parameters of the binormal fit that you can obtain with the sensitivity and specificity only:

 model <- lm(1-one.minus.specificity~sensitivity, AUC_data_1)

You will find the code directly in the pROC source. Although it is private and you should use it at your own risk (the functions are not exported and might disappear at any time). Something like this:

A <- coefficients(model)[1]
B <- coefficients(model)[2]
kappa <- n.controls / n.cases # number of case and control observations
# use internal function at your own risk
pROC:::var.params.obuchowski(A, B, kappa) / n.cases
[1] 0.1125403

They also propose an approach that doesn't need the binormal coefficients:

A <- qnorm(theta) * 1.414
(0.0099 * exp(-A^2/2)) * ((5 * A^2 + 8) + (A^2 + 8)/kappa) / n.cases
[1] 0.7846169

Where theta is the AUC of the curve you calculated above.

Nancy A. Obuchowski, Donna K. McClish (1997). ``Sample size determination for diagnostic accurary studies involving binormal ROC curve indices''. Statistics in Medicine, 16(13), 1529--1542. DOI: (SICI)1097-0258(19970715)16:13<1529::AID-SIM565>3.0.CO;2-H.
Nancy A. Obuchowski, Micharl L. Lieber and Frank H. Wians Jr. (2004) “ROC Curves in Clinical Chemistry: Uses, Misuses, and Possible Solutions”. Clinical Chemistry, 50, 1118–1125. DOI: 10.1373/clinchem.2004.031823.

How to calculate the AUC from a ROC plot without the underlying data?

1 Answers