HiveContext in Spark Version 2

Question

I am working on a spark program that inserts dataframe into Hive Table as below.

import org.apache.spark.sql.SaveMode
import org.apache.spark.sql._
val hiveCont =  val hiveCont = new org.apache.spark.sql.hive.HiveContext(sc)
val partfile = sc.textFile("partfile")
val partdata = partfile.map(p => p.split(","))
case class partc(id:Int, name:String, salary:Int, dept:String, location:String)
val partRDD  = partdata.map(p => partc(p(0).toInt, p(1), p(2).toInt, p(3), p(4)))
val partDF   = partRDD.toDF()
partDF.registerTempTable("party")
hiveCont.sql("insert into parttab select id, name, salary, dept from party")

I know that Spark V2 has come out and we can use SparkSession object in it. Can we use SparkSession object to directly insert the dataframe into Hive table or do we have to use the HiveContext in version 2 also ? Can anyone let me know what is the major difference in version with respect to HiveContext ?

Raphael Roth Raphael Roth · Accepted Answer · 2017-07-03T08:51:05

You can use your SparkSession (normally called spark or ss) directly to fire a sql query (make sure hive-support is enabled when creating the spark-session):

spark.sql("insert into parttab select id, name, salary, dept from party")

But I would suggest this notation, you don't need to create a temp-table etc:

partDF
.select("id","name","salary","dept")
.write.mode("overwrite")
.insertInto("parttab")

HiveContext in Spark Version 2

1 Answers