4
votes

Suppose I have two matrices: A for Label matrix and B for corresponding predicted probability matrix of A. Now I would like to calculate the the AUPR (Area Under Precision/Recall Curve) according to matrices A and B. For common AUC (Area Under ROC Curve), there are many packages in R, such as ROCR, pROC, can directly calculate the AUC value, but currently, what packages in R can calculate the AUPR? or Can you help give the method the compute the AUPR? Here is the two example matrics:

> pp
        [,1]    [,2]     [,3]    [,4]    [,5]     [,6]    [,7]
[1,] 0.01792 0.00155 -0.00140 0.00522 0.01320  0.22506 0.00454
[2,] 0.05883 0.11256  0.82862 0.12406 0.08298 -0.00392 0.30724
[3,] 0.00743 0.06357  0.14500 0.00213 0.00545  0.03452 0.11189
[4,] 0.02571 0.01460  0.01108 0.00494 0.01246  0.11880 0.05504
[5,] 0.02407 0.00961  0.00720 0.00382 0.01039  0.10974 0.04512

> ll
        D00040 D00066 D00067 D00075 D00088 D00094 D00105
hsa190       0      0      0      0      0      1      0
hsa2099      0      1      1      0      0      0      1
hsa2100      0      0      0      0      0      0      1
hsa2101      0      0      0      0      0      0      0
hsa2103      0      0      0      0      0      0      0

pp is the predicted probability matrix for the true label ll matrix, and ll is just the label matrix.

Thanks in advance.

1

1 Answers

5
votes

I would first convert the prediction scores and classes into vectors from matrix.

There is a "PRROC" package that provides the similar function of generating ROC and PRC as "ROCR", and it also gives the AUC of the PRC.

Specifically, I'm using the data ROCR.simple from "ROCR" package as an example.

library(PRROC)
library(ROCR)
data("ROCR.simple")
scores <- data.frame(ROCR.simple$predictions, ROCR.simple$labels)
pr <- pr.curve(scores.class0=scores[scores$ROCR.simple.labels=="1",]$ROCR.simple.predictions,
             scores.class1=scores[scores$ROCR.simple.labels=="0",]$ROCR.simple.predictions,
             curve=T)

Note that here in this function, the "scores.class0" needs to be the scores for the positive class (which is a little confusing, because personally I consider 0 as negative and 1 as positive). So I switched the order of 0 and 1.

This way, the PR curve and AUC are all saved in the pr variable.

pr

Precision-recall curve

Area under curve (Integral):
 0.7815038 

Area under curve (Davis & Goadrich):
 0.7814246 

Curve for scores from  0.005422562  to  0.9910964 
( can be plotted with plot(x) )

Then, you can plot the PRC with plot(pr) or with ggplot:

y <- as.data.frame(pr$curve)
ggplot(y, aes(y$V1, y$V2))+geom_path()+ylim(0,1)

The resulting curve is the same with the curve made by ROCR package.

enter image description here enter image description here