4
votes

I am trying to create a Dataframe from RDD[cassandraRow].. But i can't because createDataframe(RDD[Row],schema: StructType) need RDD[Row] not RDD[cassandraRow].

  • How can I achieve this?

And also as per the answer in this question How to convert rdd object to dataframe in spark

( one of the answers ) suggestion for using toDF() on RDD[Row] to get Dataframe from the RDD, is not working for me. I tried using RDD[Row] in another example ( tried to use toDF() ).

  • it's also unknown for me that how can we call the method of Dataframe ( toDF() ) with instance of RDD ( RDD[Row] ) ?

I am using Scala. enter image description here

1
Did you try : import sqlContext.implicits._ rdd.toDF() ?doanduyhai
ya.. I did.. but still didn't get toDF() in inelligence ( available methods )..Parth Vishvajit
The import sqlContext.implicits._ is important to have access to all extra methods (like toDF())doanduyhai
i know it is.. i am updating that code in my question.. help me if you can.. Thanks..Parth Vishvajit
val sqlContext = new org.apache.spark.sql.SQLContext(sc); import sqlContext.implicits._doanduyhai

1 Answers

6
votes

If you really need this you can always map your data to Spark rows:

sqlContext.createDataFrame(
  rdd.map(r => org.apache.spark.sql.Row.fromSeq(r.columnValues)),
  schema
)

but if you want DataFrames it is better to import data directly:

val df = sqlContext
  .read
  .format("org.apache.spark.sql.cassandra")
  .options(Map( "table" -> table, "keyspace" -> keyspace))
  .load()