1
votes

I'm trying to use Intellij to test my spark scala code which will need to create a hive table. I have installed hive using the mysql driver locally on my MAC. I'm able to create a hive table from spark-shell with

sqlContext.sql("CREATE TABLE IF NOT EXISTS employee(id INT, name STRING, age INT) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n'")

But same command inside a scala program inside Intellij, even though runs to completion successfully, fails to actually create any table showing up on hive metastore.

val spark = SparkSession.builder
        .appName("BiddingExternalTable")
        .master("local")
        .enableHiveSupport()
        .getOrCreate()

      spark.sqlContext.sql("CREATE TABLE IF NOT EXISTS employeeExternal(id INT, name STRING, age INT) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n'")

Looking at the console output, the spark session inside Intellij is still using the default DERBY meta store.

19/05/02 17:40:06 INFO SharedState: Warehouse path is 'file:/Users/sichu/src/MktDataSSS/spark-warehouse/'.
19/05/02 17:40:07 INFO StateStoreCoordinatorRef: Registered StateStoreCoordinator endpoint
19/05/02 17:40:09 INFO HiveUtils: Initializing HiveMetastoreConnection version 1.2.1 using Spark classes.
19/05/02 17:40:09 INFO HiveMetaStore: 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
19/05/02 17:40:09 INFO ObjectStore: ObjectStore, initialize called
19/05/02 17:40:10 INFO Persistence: Property hive.metastore.integral.jdo.pushdown unknown - will be ignored
19/05/02 17:40:10 INFO Persistence: Property datanucleus.cache.level2 unknown - will be ignored
19/05/02 17:40:10 INFO ObjectStore: Setting MetaStore object pin classes with hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order"
19/05/02 17:40:11 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
19/05/02 17:40:11 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
19/05/02 17:40:11 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
19/05/02 17:40:11 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
19/05/02 17:40:11 INFO Query: Reading in results for query "org.datanucleus.store.rdbms.query.SQLQuery@0" since the connection used is closing
19/05/02 17:40:11 INFO MetaStoreDirectSql: Using direct SQL, underlying DB is DERBY

Despite I have added the JDBC driver (and its folder) to the CLASSPATH. I have also placed the hive-site.xml file at the hadoop conf directory. This hive-site.xml has been picked up by spark-shell successfully, but not when running a scala program from inside Intellij.

Can someone help me connecting my spark job inside Intellij to the mysql hive metastore I setup on my local machine. Thanks!

1
How do you handle your dependency by maven ? gradle ? sbt ? - howie
I'm using sbt. Thanks for the hint. I tried to add the config directory to the CLASSPATH with unmanagedClasspath in Runtime += baseDirectory.value / "config". For some reason, this did not work. - Sifang

1 Answers

-1
votes

You should assign you metastore location

val spark = SparkSession
      .builder()
      .master("yarn")
      .appName("Test Hive Support")
      .config("hive.metastore.uris", "jdbc:mysql://localhost/metastore")
       //or .config("hive.metastore.uris", "thrift://localhost:9083")
      .enableHiveSupport
      .getOrCreate();