How to add database details in the Spark-Hive program

Question

I am trying to load a hive table from a spark program. Until now, I used spark shell to load data into a Hive table. After learning that, I wrote a spark program on eclipse which you can see below.

import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.SaveMode

object SuperSpark {
  case class partclass(id:Int, name:String, salary:Int, dept:String, location:String)
  def main(argds: Array[String]) {
    val warehouseLocation = "file:${system:user.dir}/spark-warehouse"
    val sparkSession = SparkSession.builder.master("local[2]").appName("Saving data into HiveTable using Spark")
                        .enableHiveSupport()
                        .config("hive.exec.dynamic.partition", "true")
                        .config("hive.exec.dynamic.partition.mode", "nonstrict")
                        .config("hive.metastore.warehouse.dir", "/user/hive/warehouse")
                         .config("spark.sql.warehouse.dir", warehouseLocation)
                        .getOrCreate()
    import sparkSession.implicits._

    val partfile = sparkSession.read.textFile("partfile")
    val partdata = partfile.map(p => p.split(","))
    val partRDD  = partdata.map(line => partclass(line(0).toInt, line(1), line(2).toInt, line(3), line(4)))
    val partDF   = partRDD.toDF()
    partDF.write.mode(SaveMode.Append).insertInto("parttab")
  }
}

The point at which I am confused is,

Where should I add the database details in the program, like localhost/ip address, portnumber, database name.
I am using Spark version: 2.1.1, that is what release notes in '/usr/local/spark' says (Spark 2.1.1 built for Hadoop 2.6.4). do I need to use the HiveContext object to interact with Hive tables ?

These are the dependencies in my pom.xml:

<dependency>
    <groupId>org.apache.spark</groupId>
    <artifactId>spark-core_2.11</artifactId>
    <version>2.1.1</version>
</dependency>
<dependency>
    <groupId>org.apache.spark</groupId>
    <artifactId>spark-sql_2.11</artifactId>
    <version>2.1.1</version>
    <scope>provided</scope>
</dependency>

Could anyone tell me how can I proceed further ?

Your code isn't compatible with Spark 1.6 and all your pom dependencies point to Spark 2.1.1. This code won't run in Spark 1.6 because you don't have SparkSession in 1.6. Answering the second part of your question, yes you need to use HiveContext. More details in this question — philantrovert
@philantrovert I have updated the version details in the question. I got that information from release notes file in the folder '/usr/local/spark' Is my code compatible with version I have mentioned ? In that case, what are the changes needed to be done in the program ? — Metadata
Yeah, your code is fine if you're using Spark 2.1. Follow the link mentioned in my previous comment and you can find more details on how to save a table to hive there. — philantrovert

dumitru dumitru · Accepted Answer · 2017-06-29T07:42:02

I think you need to provide metastore uris. You have two options:

use hive-site.xml on the resource classpath from where you run your spark application (if you are following a standard maven structure in can place it on the resource folder):

<configuration>
<property>
    <name>hive.metastore.uris</name>
    <value>thrift://192.168.1.134:9083</value>
</property>
<property>
    <name>hive.metastore.warehouse.dir</name>
    <value>/user/hive/warehouse</value>
</property>

In your spark code, configure your SparkSession object with a property like this:

.config("hive.metastore.uris", "thrift://192.168.1.134:9083")

How to add database details in the Spark-Hive program

1 Answers