1
votes

I'm using Apache Spark 2.1.1 and I'd like to set it up with an external Hive metastore (for Spark Thrift Server specifically).

I have added hive-site.xml to $SPARK_HOME/conf folder that is as follows:

<?xml version="1.0"?>
<configuration>
  <property>
    <name>javax.jdo.option.ConnectionURL</name>
    <value>jdbc:mysql://home.cu:3306/hive_metastore?createDatabaseIfNotExist=true&amp;useLegacyDatetimeCode=false&amp;serverTimezone=Europe/Berlin&amp;nullNamePatternMatchesAll=true </value>
    <description>JDBC connect string for a JDBC metastore</description>
  </property>

  <property>
    <name>javax.jdo.option.ConnectionDriverName</name>
    <value>com.mysql.jdbc.Driver</value>
    <description>Driver class name for a JDBC metastore</description>
  </property>

  <property>
    <name>javax.jdo.option.ConnectionUserName</name>
    <value>hive</value>
    <description>username to use against metastore database</description>
  </property>

  <property>
    <name>javax.jdo.option.ConnectionPassword</name>
    <value>hive</value>
    <description>password to use against metastore database</description>
  </property>
  <property>
    <name>hive.metastore.schema.verification</name>
    <value>false</value>
    <description>password to use against metastore database</description>
  </property>

  <property>
    <name>hive.metastore.warehouse.dir</name>
    <value>hdfs://spark-master.cu:9000/value_iq/hive_warehouse/</value>
    <description>Warehouse Location</description>
  </property>
</configuration>

Whenever I try to run spark-shell or Spark Thrift Server they attempt to create the Hive metastore on MySQL (as there is no metastore yet) and they fails with the following error:

17/07/13 19:57:55 ERROR Datastore: Error thrown executing ALTER TABLE `PARTITIONS` ADD COLUMN `TBL_ID` BIGINT NULL : Table 'hive_metastore.partitions' doesn't exist
java.sql.SQLSyntaxErrorException: Table 'hive_metastore.partitions' doesn't exist
        at com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:536)
        at com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:513)
        at com.mysql.cj.jdbc.exceptions.SQLExceptionsMapping.translateException(SQLExceptionsMapping.java:115)
        at com.mysql.cj.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:1983)
        at com.mysql.cj.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:1936)
        at com.mysql.cj.jdbc.StatementImpl.executeInternal(StatementImpl.java:891)
        at com.mysql.cj.jdbc.StatementImpl.execute(StatementImpl.java:795)
        at com.jolbox.bonecp.StatementHandle.execute(StatementHandle.java:254)
        at org.datanucleus.store.rdbms.table.AbstractTable.executeDdlStatement(AbstractTable.java:760)
        at org.datanucleus.store.rdbms.table.AbstractTable.executeDdlStatementList(AbstractTable.java:711)
        at org.datanucleus.store.rdbms.table.TableImpl.validateColumns(TableImpl.java:259)
        at org.datanucleus.store.rdbms.RDBMSStoreManager$ClassAdder.performTablesValidation(RDBMSStoreManager.java:3393)
        at org.datanucleus.store.rdbms.RDBMSStoreManager$ClassAdder.addClassTablesAndValidate(RDBMSStoreManager.java:3190)
        at org.datanucleus.store.rdbms.RDBMSStoreManager$ClassAdder.run(RDBMSStoreManager.java:2841)
        at org.datanucleus.store.rdbms.AbstractSchemaTransaction.execute(AbstractSchemaTransaction.java:122)
        at org.datanucleus.store.rdbms.RDBMSStoreManager.addClasses(RDBMSStoreManager.java:1605)
        at org.datanucleus.store.AbstractStoreManager.addClass(AbstractStoreManager.java:954)
        at org.datanucleus.store.rdbms.RDBMSStoreManager.getDatastoreClass(RDBMSStoreManager.java:679)
        at org.datanucleus.store.rdbms.query.RDBMSQueryUtils.getStatementForCandidates(RDBMSQueryUtils.java:408)
        at org.datanucleus.store.rdbms.query.JDOQLQuery.compileQueryFull(JDOQLQuery.java:947)
        at org.datanucleus.store.rdbms.query.JDOQLQuery.compileInternal(JDOQLQuery.java:370)
        at org.datanucleus.store.query.Query.executeQuery(Query.java:1744)
        at org.datanucleus.store.query.Query.executeWithArray(Query.java:1672)
        at org.datanucleus.store.query.Query.execute(Query.java:1654)
        at org.datanucleus.api.jdo.JDOQuery.execute(JDOQuery.java:221)
4

4 Answers

1
votes

I have found the problem, it was related with the MySQL driver, I was using mysql-connector-java-6.0.6-bin.jar and I have replaced it with and old one mysql-connector-java-5.1.23-bin.jar and now it works.

0
votes

I don't think that your warehouse dir property is configured properly, it should be a path on HDFS

<configuration>
<property>
    <name>hive.metastore.uris</name>
    <value>thrift://maprdemo:9083</value>
</property>
<property>
    <name>hive.metastore.warehouse.dir</name>
    <value>/user/hive/warehouse</value>
</property>

0
votes

I've tried with different versions of spark to migrate the metadata management from derby to mysql/postgres and I'm successful with spark-2.2.1 onwards. The before versions wouldn't let me migrate to other databases. It's purely version dependency that they have with the spark. My suggestion try upgrading the spark version to 2.2.1 and you should be good to go!!

0
votes

For all those that try use hive metastore with mysql driver 8.x and downgrading to 5.x helps:

Another solution is to configure URL with mysql options:

connectionURL: "jdbc:mysql://localhost:3306/hive_db?databaseTerm=SCHEMA&nullDatabaseMeansCurrent=true"

From release notes of mysql connector 8

A new connection property, databaseTerm, sets which of the two terms is used in an application to refer to a database. The property takes one of the two values CATALOG or SCHEMA and uses it to determine which Connection methods can be used to set/get the current database, which arguments can be used within the various DatabaseMetaData methods to filter results, and which fields in the ResultSet returned by DatabaseMetaData methods contain the database identification information. See the entry for databaseTerm in Configuration Properties for details. Also, the connection property nullCatalogMeansCurrent has been renamed to nullDatabaseMeansCurrent. The old name remains an alias for the connection property. Thanks to Harald Aamot for contributing to the patch. (Bug #11891000, Bug #27356869, Bug #89133)

For the interested: the problem we've seen is that hive tries to parse all schemas in mysql and it has no permissions doing so. Limiting it to current database(hive_db specified in URL) solved problem for us