0
votes

How to convert a dataframe to RDD[String, String] ?

I have a data frame

df : [id : String, coutry :String, title: String]

How to do I convert it to RDD[String, String] where the first column would be key and the json string made of remaining columns would be value ?

key : id
value : {coutry: "US", title : "MK"}
2

2 Answers

2
votes

You can not have a RDD[String, String]. RDD takes only 1 type parameter so what you want is RDD[(String, String)].

df.rdd
  .map(row => {
    val id = row.getString(0)
    val country = row.getString(1)
    val title = row.getString(2)

    val jsonString = s"{country: $country, title: $title}"

    (id, jsonString)
  })
0
votes

There is DataFrame.toJSON that returns an RDD[String],based on this method,you can do the transformation yourself