I have a Spark DataFrame:
I have to use Spark with Scala to calculate mean average precision from RankingMetrics. I guess according to the documentation we have to use RDD instead of DataFrame. I tried the following:
var llist = df.select("predicted", "actual").rdd.map(x => (x.get(0), x.get(1))).collect()
// It gave Array[(Any, Any)]
var df_rdd =sc.parallelize(llist)
// df_rdd is org.apache.spark.rdd.RDD[(Any, Any)]
val metrics = new RankingMetrics(df_rdd)
// This gave me an error
Error :
error: type mismatch;
found : org.apache.spark.rdd.RDD[(Any, Any)]
required: org.apache.spark.rdd.RDD[(Array[?], Array[?])]
Note: (Any, Any) >: (Array[?], Array[?]), but class RDD is invariant in type T.
You may wish to define T as -T instead. (SLS 4.5)
I am using Spark version 2.4.3
How can I convert this DataFrame to that format so I can calculate mean average precision? Thanks.