1
votes

I’m trying to integrate spark(3.1.1) and hive local metastore (3.1.2) to use spark-sql.

i configured the spark-defaults.conf according to https://spark.apache.org/docs/latest/sql-data-sources-hive-tables.html and hive jar files exists in correct path.

but an exception occurred when execute 'spark.sql("show tables").show' like below.

any mistakes, hints, or corrections would be appreciated.

Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 3.1.1
      /_/

Using Scala version 2.12.10 (OpenJDK 64-Bit Server VM, Java 1.8.0_292)
Type in expressions to have them evaluated.
Type :help for more information.

scala> spark.sql("show tables").show
java.lang.NoClassDefFoundError: org/apache/hadoop/hive/ql/metadata/HiveException
  at java.lang.Class.getDeclaredConstructors0(Native Method)
  at java.lang.Class.privateGetDeclaredConstructors(Class.java:2671)
  at java.lang.Class.getConstructors(Class.java:1651)
  at org.apache.spark.sql.hive.client.IsolatedClientLoader.createClient(IsolatedClientLoader.scala:291)
  at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:492)
  at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:352)
  at org.apache.spark.sql.hive.HiveExternalCatalog.client$lzycompute(HiveExternalCatalog.scala:71)
  at org.apache.spark.sql.hive.HiveExternalCatalog.client(HiveExternalCatalog.scala:70)
  at org.apache.spark.sql.hive.HiveExternalCatalog.$anonfun$databaseExists$1(HiveExternalCatalog.scala:22
4)
  at scala.runtime.java8.JFunction0$mcZ$sp.apply(JFunction0$mcZ$sp.java:23)

spark-defaults.conf

spark.master                            yarn
spark.eventLog.enabled                  true
spark.eventLog.dir                      hdfs://192.168.5.130:9000/spark
spark.history.fs.logDirectory           hdfs://192.168.5.130:9000/spark
spark.history.provider                  org.apache.spark.deploy.history.FsHistoryProvider
spark.yarn.historyServer.address        http://192.168.5.130:8188
spark.yarn.historyServer.allowTracking  true

spark.sql.uris                          thrift://192.168.5.130:10000
spark.sql.warehouse.dir                 /user/hive/warehouse
spark.sql.hive.metastore.jars           path
spark.sql.hive.metastore.jars.path      file:///usr/local/hive/lib/*.jar
spark.sql.hive.metastore.version        3.1.2
spark.sql.hive.metastore.sharedPrefixes org.postgresql

ls /usr/local/hive/lib | grep hive

hive-accumulo-handler-3.1.2.jar
hive-beeline-3.1.2.jar
hive-classification-3.1.2.jar
hive-cli-3.1.2.jar
hive-common-3.1.2.jar
hive-contrib-3.1.2.jar
hive-druid-handler-3.1.2.jar
hive-exec-3.1.2.jar
hive-hbase-handler-3.1.2.jar
hive-hcatalog-core-3.1.2.jar
hive-hcatalog-server-extensions-3.1.2.jar
hive-hplsql-3.1.2.jar
hive-jdbc-3.1.2.jar
hive-jdbc-handler-3.1.2.jar
hive-kryo-registrator-3.1.2.jar
hive-llap-client-3.1.2.jar
hive-llap-common-3.1.2.jar
hive-llap-common-3.1.2-tests.jar
hive-llap-ext-client-3.1.2.jar
hive-llap-server-3.1.2.jar
hive-llap-tez-3.1.2.jar
hive-metastore-3.1.2.jar
hive-serde-3.1.2.jar
hive-service-3.1.2.jar
hive-service-rpc-3.1.2.jar
hive-shims-0.23-3.1.2.jar
hive-shims-3.1.2.jar
hive-shims-common-3.1.2.jar
hive-shims-scheduler-3.1.2.jar
hive-standalone-metastore-3.1.2.jar
hive-storage-api-2.7.0.jar
hive-streaming-3.1.2.jar
hive-testutils-3.1.2.jar
hive-upgrade-acid-3.1.2.jar
hive-vector-code-gen-3.1.2.jar

hive-site.xml

<configuration>
  <property>
     <name>javax.jdo.option.ConnectionURL</name>
     <value>jdbc:postgresql://192.168.5.130:5432/hive?createDatabaseIfNotExist=true</value>
  </property>
  <property>
     <name>javax.jdo.option.ConnectionDriverName</name>
     <value>org.postgresql.Driver</value></property>
  <property>
     <name>javax.jdo.option.ConnectionUserName</name>
     <value>hive</value>
  </property>
  <property>
    <name>javax.jdo.option.ConnectionPassword</name>
    <value>hive</value>
  </property>
  <property>
    <name>hive.metastore.warehouse.dir</name>
    <value>hdfs://192.168.5.130:9000/user/hive/warehouse</value>
  </property>
  <property>
    <name>hive.server2.enable.doAs</name>
    <value>false</value>
  </property>
  <property>
     <name>datanucleus.autoCreateSchema</name>
     <value>false</value>
  </property>
  <property>
    <name>hive.aux.jars.path</name>
    <value>file:///usr/local/hive/lib</value> 
   </property>
</configuration>

After copy hive-site.xml to $SPARK_HOME/conf, not found exception occrred about org/apache/commons/collections/CollectionUtils like below.

spark.sql("show tables").show

scala> spark.sql("show tables").show
21/05/24 00:49:58 ERROR FileUtils: The jar file path file:///usr/local/hive/lib/*.jar doesn't exist
Hive Session ID = a6d63a41-e235-4d8c-a660-6f7b1a22996b
21/05/24 00:49:59 WARN ObjectStore: datanucleus.autoStartMechanismMode is set to unsupported value null . Setting it to value: ignored
21/05/24 00:50:01 WARN MetaData: Metadata has jdbc-type of null yet this is not valid. Ignored
21/05/24 00:50:01 WARN MetaData: Metadata has jdbc-type of null yet this is not valid. Ignored
21/05/24 00:50:01 WARN MetaData: Metadata has jdbc-type of null yet this is not valid. Ignored
21/05/24 00:50:01 WARN MetaData: Metadata has jdbc-type of null yet this is not valid. Ignored
21/05/24 00:50:01 WARN MetaData: Metadata has jdbc-type of null yet this is not valid. Ignored
21/05/24 00:50:01 WARN MetaData: Metadata has jdbc-type of null yet this is not valid. Ignored
21/05/24 00:50:02 WARN MetaData: Metadata has jdbc-type of null yet this is not valid. Ignored
21/05/24 00:50:02 WARN MetaData: Metadata has jdbc-type of null yet this is not valid. Ignored
21/05/24 00:50:02 WARN MetaData: Metadata has jdbc-type of null yet this is not valid. Ignored
21/05/24 00:50:02 WARN MetaData: Metadata has jdbc-type of null yet this is not valid. Ignored
21/05/24 00:50:02 WARN MetaData: Metadata has jdbc-type of null yet this is not valid. Ignored
21/05/24 00:50:02 WARN MetaData: Metadata has jdbc-type of null yet this is not valid. Ignored
21/05/24 00:50:02 WARN Hive: Failed to register all functions.
java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
        at org.apache.hadoop.hive.metastore.utils.JavaUtils.newInstance(JavaUtils.java:86)
        at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.<init>(RetryingMetaStoreClient.java:95)
        at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:148)
        at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:119)
        at org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:4299)
        at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:4367)
        at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:4347)
        at org.apache.hadoop.hive.ql.metadata.Hive.getAllFunctions(Hive.java:4603)
        at org.apache.hadoop.hive.ql.metadata.Hive.reloadFunctions(Hive.java:291)
        at org.apache.hadoop.hive.ql.metadata.Hive.registerAllFunctionsOnce(Hive.java:274)
        at org.apache.hadoop.hive.ql.metadata.Hive.<init>(Hive.java:435)
        at org.apache.hadoop.hive.ql.metadata.Hive.create(Hive.java:375)
        at org.apache.hadoop.hive.ql.metadata.Hive.getInternal(Hive.java:355)
        at org.apache.hadoop.hive.ql.metadata.Hive.get(Hive.java:331)
        at org.apache.spark.sql.hive.client.HiveClientImpl.client(HiveClientImpl.scala:257)
        at org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$withHiveState$1(HiveClientImpl.scala:283)
        at org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:224)
        at org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:223)
        at org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:273)
        at org.apache.spark.sql.hive.client.HiveClientImpl.databaseExists(HiveClientImpl.scala:384)
        at org.apache.spark.sql.hive.HiveExternalCatalog.$anonfun$databaseExists$1(HiveExternalCatalog.scala:224)
        at scala.runtime.java8.JFunction0$mcZ$sp.apply(JFunction0$mcZ$sp.java:23)
        at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:102)
        at org.apache.spark.sql.hive.HiveExternalCatalog.databaseExists(HiveExternalCatalog.scala:224)
        at org.apache.spark.sql.internal.SharedState.externalCatalog$lzycompute(SharedState.scala:134)
        at org.apache.spark.sql.internal.SharedState.externalCatalog(SharedState.scala:124)
        at org.apache.spark.sql.internal.SharedState.globalTempViewManager$lzycompute(SharedState.scala:154)
        at org.apache.spark.sql.internal.SharedState.globalTempViewManager(SharedState.scala:152)
        at org.apache.spark.sql.hive.HiveSessionStateBuilder.$anonfun$catalog$2(HiveSessionStateBuilder.scala:60)
        at org.apache.spark.sql.catalyst.catalog.SessionCatalog.globalTempViewManager$lzycompute(SessionCatalog.scala:99)
        at org.apache.spark.sql.catalyst.catalog.SessionCatalog.globalTempViewManager(SessionCatalog.scala:99)
        at org.apache.spark.sql.catalyst.catalog.SessionCatalog.listTables(SessionCatalog.scala:946)
        at org.apache.spark.sql.catalyst.catalog.SessionCatalog.listTables(SessionCatalog.scala:932)
        at org.apache.spark.sql.catalyst.catalog.SessionCatalog.listTables(SessionCatalog.scala:924)
        at org.apache.spark.sql.execution.command.ShowTablesCommand.$anonfun$run$43(tables.scala:868)
        at scala.Option.getOrElse(Option.scala:189)
        at org.apache.spark.sql.execution.command.ShowTablesCommand.run(tables.scala:868)
        at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
        at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
        at org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:79)
        at org.apache.spark.sql.Dataset.$anonfun$logicalPlan$1(Dataset.scala:228)
        at org.apache.spark.sql.Dataset.$anonfun$withAction$1(Dataset.scala:3687)
        at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:103)
        at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163)
        at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90)
        at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:772)
        at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
        at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3685)
        at org.apache.spark.sql.Dataset.<init>(Dataset.scala:228)
        at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:99)
        at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:772)
        at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:96)
        at org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:615)
        at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:772)
        at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:610)
        at $line14.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.<init>(<console>:24)
        at $line14.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw.<init>(<console>:28)
        at $line14.$read$$iw$$iw$$iw$$iw$$iw$$iw.<init>(<console>:30)
        at $line14.$read$$iw$$iw$$iw$$iw$$iw.<init>(<console>:32)
        at $line14.$read$$iw$$iw$$iw$$iw.<init>(<console>:34)
        at $line14.$read$$iw$$iw$$iw.<init>(<console>:36)
        at $line14.$read$$iw$$iw.<init>(<console>:38)
        at $line14.$read$$iw.<init>(<console>:40)
        at $line14.$read.<init>(<console>:42)
        at $line14.$read$.<init>(<console>:46)
        at $line14.$read$.<clinit>(<console>)
        at $line14.$eval$.$print$lzycompute(<console>:7)
        at $line14.$eval$.$print(<console>:6)
        at $line14.$eval.$print(<console>)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at scala.tools.nsc.interpreter.IMain$ReadEvalPrint.call(IMain.scala:745)
        at scala.tools.nsc.interpreter.IMain$Request.loadAndRun(IMain.scala:1021)
        at scala.tools.nsc.interpreter.IMain.$anonfun$interpret$1(IMain.scala:574)
        at scala.reflect.internal.util.ScalaClassLoader.asContext(ScalaClassLoader.scala:41)
        at scala.reflect.internal.util.ScalaClassLoader.asContext$(ScalaClassLoader.scala:37)
        at scala.reflect.internal.util.AbstractFileClassLoader.asContext(AbstractFileClassLoader.scala:41)
        at scala.tools.nsc.interpreter.IMain.loadAndRunReq$1(IMain.scala:573)
        at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:600)
        at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:570)
        at scala.tools.nsc.interpreter.ILoop.interpretStartingWith(ILoop.scala:894)
        at scala.tools.nsc.interpreter.ILoop.command(ILoop.scala:762)
        at scala.tools.nsc.interpreter.ILoop.processLine(ILoop.scala:464)
        at scala.tools.nsc.interpreter.ILoop.loop(ILoop.scala:485)
        at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:239)
        at org.apache.spark.repl.Main$.doMain(Main.scala:78)
        at org.apache.spark.repl.Main$.main(Main.scala:58)
        at org.apache.spark.repl.Main.main(Main.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
        at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:951)
        at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
        at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
        at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
        at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1030)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1039)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.reflect.InvocationTargetException
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
        at org.apache.hadoop.hive.metastore.utils.JavaUtils.newInstance(JavaUtils.java:84)
        ... 101 more
Caused by: MetaException(message:org/apache/commons/collections/CollectionUtils)
        at org.apache.hadoop.hive.metastore.RetryingHMSHandler.<init>(RetryingHMSHandler.java:84)
        at org.apache.hadoop.hive.metastore.RetryingHMSHandler.getProxy(RetryingHMSHandler.java:93)
        at org.apache.hadoop.hive.metastore.HiveMetaStore.newRetryingHMSHandler(HiveMetaStore.java:8667)
        at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:169)
        at org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.<init>(SessionHiveMetaStoreClient.java:94)
        ... 106 more
Caused by: java.lang.NoClassDefFoundError: org/apache/commons/collections/CollectionUtils
        at org.apache.hadoop.hive.metastore.ObjectStore.grantPrivileges(ObjectStore.java:5709)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:97)
        at com.sun.proxy.$Proxy39.grantPrivileges(Unknown Source)
        at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultRoles_core(HiveMetaStore.java:828)
        at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultRoles(HiveMetaStore.java:794)
        at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:539)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:147)
        at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:108)
        at org.apache.hadoop.hive.metastore.RetryingHMSHandler.<init>(RetryingHMSHandler.java:80)
        ... 110 more
Caused by: java.lang.ClassNotFoundException: org.apache.commons.collections.CollectionUtils
        at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
        at org.apache.spark.sql.hive.client.IsolatedClientLoader$$anon$1.doLoadClass(IsolatedClientLoader.scala:247)
        at org.apache.spark.sql.hive.client.IsolatedClientLoader$$anon$1.loadClass(IsolatedClientLoader.scala:236)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
        ... 127 more


repeated logs ...

this error can be fixed to copying $SPARK_HOME/jars/commons-collections-3.2.2.jar to $HIVE_HOME/lib.

1

1 Answers

0
votes

Seems your hive conf is missing. To connect to hive metastore you need to copy the hive-site.xml file into spark/conf directory.

Try

cp  /usr/lib/hive/conf/hive-site.xml    ${SPARK_HOME}/conf/