Spark : regression model threshold and precision

Question

I have logistic regression mode, where I explicitly set the threshold to 0.5.

model.setThreshold(0.5)

I train the model and then I want to get basic stats -- precision, recall etc.

This is what I do when I evaluate the model:

val metrics = new BinaryClassificationMetrics(predictionAndLabels)

val precision = metrics.precisionByThreshold


precision.foreach { case (t, p) =>

      println(s"Threshold is: $t, Precision is: $p")

    }

I get results with only 0.0 and 1.0 as values of threshold and 0.5 is completely ignored.

Here is the output of the above loop:

Threshold is: 1.0, Precision is: 0.8571428571428571

Threshold is: 0.0, Precision is: 0.3005181347150259

When I call metrics.thresholds() it also returns only two values, 0.0 and 1.0.

How do I get the precision and recall values with threshold as 0.5?

nDakota nDakota · Accepted Answer · 2016-03-17T14:39:42

You need to clear the model threshold before you make predictions. Clearing threshold makes your predictions return a score and not the classified label. If not you will only have two thresholds, i.e. your labels 0.0 and 1.0.

model.clearThreshold()

A tuple from predictionsAndLabels should look like (0.6753421,1.0) and not (1.0,1.0)

Take a look at https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/mllib/BinaryClassificationMetricsExample.scala

You probably still want to set numBins to control the number of points if the input is large.

Spark : regression model threshold and precision

3 Answers