2
votes

How can I obtain the result of the evaluator in a spark pipeline?

val evaluator = new BinaryClassificationEvaluator()

val cv = new CrossValidator()
  .setEstimator(pipeline)
  .setEvaluator(evaluator)
  .setEstimatorParamMaps(paramGrid)
  .setNumFolds(10)

The result of the transform operation only contain the labels, probabilities, and predictions.

It is possible to obtain a "best model" but I rather would be interested in getting the evaluation metrics.

Here https://jaceklaskowski.gitbooks.io/mastering-apache-spark/content/spark-mllib/spark-mllib-evaluators.html they show how to use an evaluator without a pipeline.

None of the very interesting links seem to use the evaluator. https://benfradet.github.io/blog/2015/12/16/Exploring-spark.ml-with-the-Titanic-Kaggle-competition, here https://developer.ibm.com/spark/blog/2016/02/22/predictive-model-for-online-advertising-using-spark-machine-learning-pipelines/ or in the official examples https://github.com/apache/spark/blob/39e2bad6a866d27c3ca594d15e574a1da3ee84cc/examples/src/main/scala/org/apache/spark/examples/ml/ModelSelectionViaCrossValidationExample.scala is the result of the Evaluator displayed at last.

In fact one of the links calculates the metric by hand:

cvAccuracy = cvPrediction.filter(cvPrediction['label'] == cvPrediction['prediction']).count() / float(cvPrediction.count

I would have expected to obtain the metrics on a perf fold level or possibly a mean / variance.

1
are you interested in the performance metric per paramGrid value?mtoto
Sort of. I want to check if parameter setting A or algorithm B is better than another setting / algorithm.Georg Heiler

1 Answers

4
votes

CrossValidatorModel not only contains the best model with the highest average cross-validation metric across folds - aka bestModel - but also the metrics for each param map evaluated.

To grab these, you can use the getEstimatorParamMaps method in combination with avgMetrics, for example:

val cvModel = cv.fit(training)
cvModel.getEstimatorParamMaps.zip(cvModel.avgMetrics)