4
votes

To plot roc curve:

library(ROCR)
<data cleaning/scrubbing>
<train data>
.....
.....
rf.perf = performance(rf.prediction, "tpr", "fpr") #for RF
logit.perf = performance (logit.prediction, "tpr", "fpr") #for logistic reg
tree.perf = performance(tree.prediction, "tpr", "fpr") #for cart tree
...
plot(re.perf) #a RF roc curve

If I want to run a xgboost classification and subsequently plotting roc: objective = "binary:logistics"

I'm confused with the xgboost's arguments metrics "auc" (page 9 of the CRAN manual), it says area. How does one plot the curve with tpr and fpr for model comparison?

I tried search the net and github, most emphasis on feature importance graph (for xgboost).

Thanks

1
Just to clarify, the AUC is the Area Under the Curve of the Receiver Operator Curve (ROC). This is a metric between 0-1. For me it is not entirely clear what you're question is. I guess you just want to plot the ROC but are having difficulties doing this? Maybe include a reproducible example, that could help.horseoftheyear

1 Answers

4
votes

Let me first talk about ROC curve

The ROC curve is created by plotting the true positive rate (TPR) against the false positive rate (FPR) at various threshold settings.

In python it can be done easily as:

from sklearn import metrics
def buildROC(target_test,test_preds):
    fpr, tpr, threshold = metrics.roc_curve(target_test, test_preds)
    roc_auc = metrics.auc(fpr, tpr)
    plt.title('Receiver Operating Characteristic')
    plt.plot(fpr, tpr, 'b', label = 'AUC = %0.2f' % roc_auc)
    plt.legend(loc = 'lower right')
    plt.plot([0, 1], [0, 1],'r--')
    plt.ylabel('True Positive Rate')
    plt.xlabel('False Positive Rate')
    plt.gcf().savefig('roc.png')

enter image description here

For example in above image, at certain threshold and at cost of false positive rate 0.2, we can get true positive nearly 0.96 - 0.97

A good documentation on ROC