How can I use pyspark.mllib rdd api metric to measure pyspark.ml (new dataframe api)?

Question

The old API of MlLib has evaluation metric class: https://spark.apache.org/docs/latest/mllib-evaluation-metrics.html However, the new dataframe API does NOT have such a class: https://spark.apache.org/docs/latest/ml-guide.html

It has the Evaluator class but it is limited.

How can I evaluate a model in the new API using the metric class of the old one?

AbdealiJK AbdealiJK · Accepted Answer · 2016-09-06T10:26:42

You can easily convert a DataFrame into a RDD using .rdd() (in pyspark)

Hence, you can make the final model using the pyspark.ml likbrary, and then find the predictions of the testing data and do testing_predictions.rdd() and use that RDD in functions from pyspark.mllib.

How can I use pyspark.mllib rdd api metric to measure pyspark.ml (new dataframe api)?

1 Answers