2
votes

I've a DF that I'm explicitly converting into an RDD and trying to fetch each column's record. Not able to fetch each of them within a map. Below is what I've tried:

val df = sql("Select col1, col2, col3, col4, col5 from tableName").rdd

The resultant df becomes the member of org.apache.spark.rdd.RDD[org.apache.spark.sql.Row]

Now I'm trying to access each element of this RDD via:

val dfrdd = df.map{x => x.get(0); x.getAs[String](1); x.get(3)}

The issue is, the above statement returns only the data present on the last transformation of map i.e., the data present on x.get(3). Can someone let me know what I'm doing wrong?

1

1 Answers

4
votes

The last line is always returned from the map, In your case x.get(3) gets returned.

To return multiple values you can return tuples as below

val dfrdd = df.map{x => (x.get(0), x.getAs[String](1), x.get(3))}

Hope this helped!