I've been playing around with converting RDDs to DataFrames and back again. First, I had an RDD of type (Int, Int) called dataPair. Then I created a DataFrame object with column headers using:
val dataFrame = dataPair.toDF(header(0), header(1))
Then I converted it from a DataFrame back to an RDD using:
val testRDD = dataFrame.rdd
which returns an RDD of type org.apache.spark.sql.Row (not (Int, Int)). Then I'd like to convert it back to an RDD using .toDF but I get an error:
error: value toDF is not a member of org.apache.spark.rdd.RDD[org.apache.spark.sql.Row]
I've tried defining a Schema of type Data(Int, Int) for testRDD, but I get type mismatch exceptions:
error: type mismatch;
found : org.apache.spark.rdd.RDD[org.apache.spark.sql.Row]
required: org.apache.spark.rdd.RDD[Data]
val testRDD: RDD[Data] = dataFrame.rdd
^
I've already imported
import sqlContext.implicits._