Spark Java : Is optimal threshold used for calculating ROC in Spark BinaryClassificationMetrics class

Question

I am using Spark mlib's BinaryClassificationMetrics class to generate the metrics for the output of RandomForestClassificationModel. I have gone through the Spark docs and I am able to generate thresholds, precisionByThreshold, recallByThreshold, roc and pr.

I wanted to know if any particular threshold value is used while generating roc. This is because in ROC wikipedia it says that:

The ROC curve is created by plotting the true positive rate (TPR) against the false positive rate (FPR) at various threshold settings.

I was wondering if any optimal threshold value is used or not while generating ROC in Spark. If not why?

Ben Ben · Accepted Answer · 2018-12-17T21:35:59

I believe it's 0.5, BinaryClassificationMetrics uses BinaryLabelCounter whose label counting method looks like so:

def +=(label: Double): BinaryLabelCounter = {
  // Though we assume 1.0 for positive and 0.0 for negative, the following check will handle
  // -1.0 for negative as well.
  if (label > 0.5) numPositives += 1L else numNegatives += 1L
  this
}

Spark Java : Is optimal threshold used for calculating ROC in Spark BinaryClassificationMetrics class

1 Answers