21
votes

I am working with Spark 2.0 Scala. I am able to convert an RDD to a DataFrame using the toDF() method.

val rdd = sc.textFile("/pathtologfile/logfile.txt")
val df = rdd.toDF()

But for the life of me I cannot find where this is in the API docs. It is not under RDD. But it is under DataSet (link 1). However I have an RDD not a DataSet.

Also I can't see it under implicits (link 2).

So please help me understand why toDF() can be called for my RDD. Where is this method being inherited from?

4
Where are you calling this from? spark-shell?Yuval Itzchakov
Yes. Just have a local Spark setup and running my scala script using - ./bin/spark-shell --master local[2] -i /pathtomyscale/myfile.scalaCarl

4 Answers

18
votes

It's coming from here:

Spark 2 API

Explanation: if you import sqlContext.implicits._, you have a implicit method to convert RDD to DataSetHolder (rddToDataSetHolder), then you call toDF on the DataSetHolder

3
votes

Yes, you should import sqlContext implicits like that:

val sqlContext = //create sqlContext

import sqlContext.implicits._

val df = RDD.toDF()

Before you call to "toDF" in your RDDs

2
votes

Yes I finally found piece of mind, this issue. It was troubling me like hell, this post is a life saver. I was trying to generically load data from log files to a case class object making it mutable List, this idea was to finally convert the list into DF. However as it was mutable and Spark 2.1.1 has changed the toDF implementation, what ever why the list want not getting converted. I finally thought of even covering save the data to file and the load it back using .read. However 5 min back this post had saved my day.

I did the exact same way as described.

after loading the data to mutable list I immediately used

import spark.sqlContext.implicits._
val df = <mutable list object>.toDF 
df.show()
1
votes

I have done just this with Spark 2. it worked.

val orders = sc.textFile("/user/gd/orders")
val ordersDF = orders.toDF()