5
votes

We have upgraded HDP cluster to 3.1.1.3.0.1.0-187 and have discovered:

  1. Hive has a new metastore location
  2. Spark can't see Hive databases

In fact we see:

org.apache.spark.sql.catalyst.analysis.NoSuchDatabaseException: Database ... not found

Could you help me understanding what has happened and how to solve this?

Update:

Configuration:

(spark.sql.warehouse.dir,/warehouse/tablespace/external/hive/) (spark.admin.acls,) (spark.yarn.dist.files,file:///opt/folder/config.yml,file:///opt/jdk1.8.0_172/jre/lib/security/cacerts) (spark.history.kerberos.keytab,/etc/security/keytabs/spark.service.keytab) (spark.io.compression.lz4.blockSize,128kb) (spark.executor.extraJavaOptions,-Djavax.net.ssl.trustStore=cacerts) (spark.history.fs.logDirectory,hdfs:///spark2-history/) (spark.io.encryption.keygen.algorithm,HmacSHA1) (spark.sql.autoBroadcastJoinThreshold,26214400) (spark.eventLog.enabled,true) (spark.shuffle.service.enabled,true) (spark.driver.extraLibraryPath,/usr/hdp/current/hadoop-client/lib/native:/usr/hdp/current/hadoop-client/lib/native/Linux-amd64-64) (spark.ssl.keyStore,/etc/security/serverKeys/server-keystore.jks) (spark.yarn.queue,default) (spark.jars,file:/opt/folder/component-assembly-0.1.0-SNAPSHOT.jar) (spark.ssl.enabled,true) (spark.sql.orc.filterPushdown,true) (spark.shuffle.unsafe.file.output.buffer,5m) (spark.yarn.historyServer.address,master2.env.project:18481) (spark.ssl.trustStore,/etc/security/clientKeys/all.jks) (spark.app.name,com.company.env.component.MyClass) (spark.sql.hive.metastore.jars,/usr/hdp/current/spark2-client/standalone-metastore/*) (spark.io.encryption.keySizeBits,128) (spark.driver.memory,2g) (spark.executor.instances,10) (spark.history.kerberos.principal,spark/[email protected]) (spark.unsafe.sorter.spill.reader.buffer.size,1m) (spark.ssl.keyPassword,*********(redacted)) (spark.ssl.keyStorePassword,*********(redacted)) (spark.history.fs.cleaner.enabled,true) (spark.shuffle.io.serverThreads,128) (spark.sql.hive.convertMetastoreOrc,true) (spark.submit.deployMode,client) (spark.sql.orc.char.enabled,true) (spark.master,yarn) (spark.authenticate.enableSaslEncryption,true) (spark.history.fs.cleaner.interval,7d) (spark.authenticate,true) (spark.history.fs.cleaner.maxAge,90d) (spark.history.ui.acls.enable,true) (spark.acls.enable,true) (spark.history.provider,org.apache.spark.deploy.history.FsHistoryProvider) (spark.executor.extraLibraryPath,/usr/hdp/current/hadoop-client/lib/native:/usr/hdp/current/hadoop-client/lib/native/Linux-amd64-64) (spark.executor.memory,2g) (spark.io.encryption.enabled,true) (spark.shuffle.file.buffer,1m) (spark.eventLog.dir,hdfs:///spark2-history/) (spark.ssl.protocol,TLS) (spark.dynamicAllocation.enabled,true) (spark.executor.cores,3) (spark.history.ui.port,18081) (spark.sql.statistics.fallBackToHdfs,true) (spark.repl.local.jars,file:///opt/folder/postgresql-42.2.2.jar,file:///opt/folder/ojdbc6.jar) (spark.ssl.trustStorePassword,*********(redacted)) (spark.history.ui.admin.acls,) (spark.history.kerberos.enabled,true) (spark.shuffle.io.backLog,8192) (spark.sql.orc.impl,native) (spark.ssl.enabledAlgorithms,TLS_RSA_WITH_AES_128_CBC_SHA,TLS_RSA_WITH_AES_256_CBC_SHA) (spark.sql.orc.enabled,true) (spark.yarn.dist.jars,file:///opt/folder/postgresql-42.2.2.jar,file:///opt/folder/ojdbc6.jar) (spark.sql.hive.metastore.version,3.0)

And from hive-site.xml:

<property>
  <name>hive.metastore.warehouse.dir</name>
  <value>/warehouse/tablespace/managed/hive</value>
</property>

Code looks like:

val spark = SparkSession
  .builder()
  .appName(getClass.getSimpleName)
  .enableHiveSupport()
  .getOrCreate()
...
dataFrame.write
  .format("orc")
  .options(Map("spark.sql.hive.convertMetastoreOrc" -> true.toString))
  .mode(SaveMode.Append)
  .saveAsTable("name")

Spark-submit:

    --master yarn \
    --deploy-mode client \
    --driver-memory 2g \
    --driver-cores 4 \
    --executor-memory 2g \
    --num-executors 10 \
    --executor-cores 3 \
    --conf "spark.dynamicAllocation.enabled=true" \
    --conf "spark.shuffle.service.enabled=true" \
    --conf "spark.executor.extraJavaOptions=-Djavax.net.ssl.trustStore=cacerts" \
    --conf "spark.sql.warehouse.dir=/warehouse/tablespace/external/hive/" \
    --jars postgresql-42.2.2.jar,ojdbc6.jar \
    --files config.yml,/opt/jdk1.8.0_172/jre/lib/security/cacerts \
    --verbose \
    component-assembly-0.1.0-SNAPSHOT.jar \
2
Could you try passing the hive.xml location in the spark-submit as --file command ?Avishek Bhattacharya
Can you check the value of spark.sql.warehouse.dir and perhaps hive.metastore.warehouse.dir? Could you include the Environment tab from web UI in the question? You can always use hive-site.xml on CLASSPATH to point to the directory.Jacek Laskowski
BTW I can't seem to find the version of HDP at docs.hortonworks.com. The latest seems HDP-3.0.1. I'm a bit confused.Jacek Laskowski
Thank you for quick responce, guys. Jacek, this build: repo.hortonworks.com/content/repositories/releases/org/apache/…Eugene Lopatkin
How do you access a Hive table? Can you show the exact query (e.g. spark.read...)? What's the directory of the Hive warehouse? Can you check all the HADOOP_-, YARN_- or HIVE_-related environment variables?Jacek Laskowski

2 Answers

5
votes

Looks like this is a not implemented Spark feature. But the only one way to use Spark and Hive since 3.0 that I found is to use HiveWarehouseConnector from Horton. Documentation here. And good guide from Horton Community here. I leave the question unanswered until Spark developers have prepared an own solution.

0
votes

I've got a bit of a throwback trick for this one although disclaimer, it bypasses the ranger permissions (don't blame me if you incur the wrath of an admin).

To use with the spark-shell

export HIVE_CONF_DIR=/usr/hdp/current/hive-client/conf
spark-shell --conf "spark.driver.extraClassPath=/usr/hdp/current/hive-client/conf"

To use with sparklyR

Sys.setenv(HIVE_CONF_DIR="/usr/hdp/current/hive-client/conf")
conf = spark_config()
conf$'sparklyr.shell.driver-class-path' = '/usr/hdp/current/hive-client/conf'

It should work for the thriftserver too but I have not tested.