0
votes

I'm trying to connect to the Hive warehouse directory by using Spark on IntelliJ which is located at the following path :

hdfs://localhost:9000/user/hive/warehouse

In order to do this, I'm using the following code :

import org.apache.spark.sql.SparkSession

// warehouseLocation points to the default location for managed databases and tables
val warehouseLocation = "hdfs://localhost:9000/user/hive/warehouse"

val spark = SparkSession
 .builder()
 .appName("Spark Hive Local Connector")
 .config("spark.sql.warehouse.dir", warehouseLocation)
 .config("spark.master", "local")
 .enableHiveSupport()
 .getOrCreate()

spark.catalog.listDatabases().show(false)
spark.catalog.listTables().show(false)
spark.conf.getAll.mkString("\n")

import spark.implicits._
import spark.sql

sql("USE test")
sql("SELECT * FROM test.employee").show()

As one can see, I have created a database 'test' and create a table 'employee' into this database using the hive console. I want to get the result of the latest request.

The 'spark.catalog.' and 'spark.conf.' are used in order to print the properties of the warehouse path and database settings.

spark.catalog.listDatabases().show(false) gives me :

  • name : default
  • description : Default Hive database
  • locationUri : hdfs://localhost:9000/user/hive/warehouse

spark.catalog.listTables.show(false) gives me an empty result. So something is wrong at this step.

At the end of the execution of the job, i obtained the following error :

> Exception in thread "main" org.apache.spark.sql.catalyst.analysis.NoSuchDatabaseException: Database 'test' not found;

I have also configured the hive-site.xml file for the Hive warehouse location :

<property>
<name>hive.metastore.warehouse.dir</name>
<value>hdfs://localhost:9000/user/hive/warehouse</value>
</property>

I have already created the database 'test' using the Hive console.

Below, the versions of my components :

  • Spark : 2.2.0
  • Hive : 1.1.0
  • Hadoop : 2.7.3

Any ideas ?

1
i don't think you need port in path. use this val warehouseLocation = "hdfs:///user/hive/warehouse" - Gaurang Shah
I have already tried and i obtained the same error. - Mamaf
do you have test database - Gaurang Shah
Yes, I send you the result of the 'show databases' command : default test Time taken: 0.542 seconds, Fetched 2 rows` - Mamaf

1 Answers

0
votes

Create the resource directory under the src in your IntelliJ project copy the conf files under this folder. Build the project .. Ensure to define hive.metastore.warehouse.uris path correctly refer the hive-site.xml . In log if your are getting INFO metastore: Connected to metastore then you are good to go. Example.

Kindly note it making connection to intellij and running the job will be slow compare to package the jar and running on your hadoop cluster.