Spark build in hive MySQL metastore isn't being used

Question

I'm using Apache Spark 2.1.1 and I have put the following hive-site.xml on $SPARK_HOME/conf folder:

<?xml version="1.0"?>
<configuration>
<property>
  <name>javax.jdo.option.ConnectionURL</name>
  <value>jdbc:mysql://mysql_server:3306/hive_metastore?createDatabaseIfNotExist=true</value>
  <description>JDBC connect string for a JDBC metastore</description>
</property>

<property>
  <name>javax.jdo.option.ConnectionDriverName</name>
  <value>com.mysql.jdbc.Driver</value>
  <description>Driver class name for a JDBC metastore</description>
</property>

<property>
  <name>javax.jdo.option.ConnectionUserName</name>
  <value>hive</value>
  <description>username to use against metastore database</description>
</property>

<property>
  <name>javax.jdo.option.ConnectionPassword</name>
  <value>password</value>
  <description>password to use against metastore database</description>
</property>
<property>
  <name>hive.metastore.schema.verification</name>
  <value>false</value>
  <description>password to use against metastore database</description>
</property>
<property>
  <name>hadoop.tmp.dir</name>
  <value>${test.tmp.dir}/hadoop-tmp</value>
  <description>A base for other temporary directories.</description>
</property>
<property>
  <name>hive.metastore.warehouse.dir</name>
  <value>hdfs://hadoop_namenode:9000/value_iq/hive_warehouse/</value>
  <description>Warehouse Location</description>
</property>
</configuration>

When I start the thrift server the metastore schema is created on my MySQL DB but is not used, instead Derby is used.

Could not find any error on the thrift server log file, the only thing that calls my attentions is that it attempts to use MySQL at first (INFO MetaStoreDirectSql: Using direct SQL, underlying DB is MYSQL) but then without any error use Derby instead (INFO MetaStoreDirectSql: Using direct SQL, underlying DB is DERBY). This is the thrift server log https://www.dropbox.com/s/rxfwgjm9bdccaju/spark-root-org.apache.spark.sql.hive.thriftserver.HiveThriftServer2-1-s-master.value-iq.com.out?dl=0

I have no hive installed on my system, I just pretend to use the built in Hive of Apache Spark.

I'm using mysql-connector-java-5.1.23-bin.jar which is located on $SPARK_HOME/jars folder.

set, value of hive.metastore.schema.verification to false in hive-site.xml in both hive and spark conf and restart services and try again — Noman Khan
I have set it and same behavior. When you say in both hive and spark you mean in $SPARK_HOME/conf/hive-site.xml and $SPARK_HOME/conf/spark-defaults.conf? Remember I don't have hive installed, I'm using Spark built in Hive. — José

user1314742 user1314742 · Accepted Answer · 2017-07-31T11:08:27

As it appears in the hive-site.xml you have not set the metastore service to connect to. So spark will use the default one which is local metastore service with derby DB backend
I order to use Metastore service that has MySQL DB as its backend, you have to :

Start the metastore service. you can have a look here how to start the service hive metastore admin manual. You start your metastore service with the backend of MySQL DB, using your same hive-site.xml and you add the folowing lines to start the metastore service on METASTORESERVER on the port XXXX:
```
<property>
  <name>hive.metastore.uris</name>
  <value>thrift://METASTRESERVER:XXXX</value>
</property>
```
Let spark knows where the metastore service has started. That could be done using the same hive-site.xml you'have used when starting the metastore service (with the lines above added to it) copy this file into the configuration path of Spark, then restart your spark thrift server

Spark build in hive MySQL metastore isn't being used

1 Answers