I’m trying to integrate spark(3.1.1) and hive local metastore (3.1.2) to use spark-sql.
i configured the spark-defaults.conf according to https://spark.apache.org/docs/latest/sql-data-sources-hive-tables.html and hive jar files exists in correct path.
but an exception occurred when execute 'spark.sql("show tables").show' like below.
any mistakes, hints, or corrections would be appreciated.
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 3.1.1
/_/
Using Scala version 2.12.10 (OpenJDK 64-Bit Server VM, Java 1.8.0_292)
Type in expressions to have them evaluated.
Type :help for more information.
scala> spark.sql("show tables").show
java.lang.NoClassDefFoundError: org/apache/hadoop/hive/ql/metadata/HiveException
at java.lang.Class.getDeclaredConstructors0(Native Method)
at java.lang.Class.privateGetDeclaredConstructors(Class.java:2671)
at java.lang.Class.getConstructors(Class.java:1651)
at org.apache.spark.sql.hive.client.IsolatedClientLoader.createClient(IsolatedClientLoader.scala:291)
at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:492)
at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:352)
at org.apache.spark.sql.hive.HiveExternalCatalog.client$lzycompute(HiveExternalCatalog.scala:71)
at org.apache.spark.sql.hive.HiveExternalCatalog.client(HiveExternalCatalog.scala:70)
at org.apache.spark.sql.hive.HiveExternalCatalog.$anonfun$databaseExists$1(HiveExternalCatalog.scala:22
4)
at scala.runtime.java8.JFunction0$mcZ$sp.apply(JFunction0$mcZ$sp.java:23)
spark-defaults.conf
spark.master yarn
spark.eventLog.enabled true
spark.eventLog.dir hdfs://192.168.5.130:9000/spark
spark.history.fs.logDirectory hdfs://192.168.5.130:9000/spark
spark.history.provider org.apache.spark.deploy.history.FsHistoryProvider
spark.yarn.historyServer.address http://192.168.5.130:8188
spark.yarn.historyServer.allowTracking true
spark.sql.uris thrift://192.168.5.130:10000
spark.sql.warehouse.dir /user/hive/warehouse
spark.sql.hive.metastore.jars path
spark.sql.hive.metastore.jars.path file:///usr/local/hive/lib/*.jar
spark.sql.hive.metastore.version 3.1.2
spark.sql.hive.metastore.sharedPrefixes org.postgresql
ls /usr/local/hive/lib | grep hive
hive-accumulo-handler-3.1.2.jar
hive-beeline-3.1.2.jar
hive-classification-3.1.2.jar
hive-cli-3.1.2.jar
hive-common-3.1.2.jar
hive-contrib-3.1.2.jar
hive-druid-handler-3.1.2.jar
hive-exec-3.1.2.jar
hive-hbase-handler-3.1.2.jar
hive-hcatalog-core-3.1.2.jar
hive-hcatalog-server-extensions-3.1.2.jar
hive-hplsql-3.1.2.jar
hive-jdbc-3.1.2.jar
hive-jdbc-handler-3.1.2.jar
hive-kryo-registrator-3.1.2.jar
hive-llap-client-3.1.2.jar
hive-llap-common-3.1.2.jar
hive-llap-common-3.1.2-tests.jar
hive-llap-ext-client-3.1.2.jar
hive-llap-server-3.1.2.jar
hive-llap-tez-3.1.2.jar
hive-metastore-3.1.2.jar
hive-serde-3.1.2.jar
hive-service-3.1.2.jar
hive-service-rpc-3.1.2.jar
hive-shims-0.23-3.1.2.jar
hive-shims-3.1.2.jar
hive-shims-common-3.1.2.jar
hive-shims-scheduler-3.1.2.jar
hive-standalone-metastore-3.1.2.jar
hive-storage-api-2.7.0.jar
hive-streaming-3.1.2.jar
hive-testutils-3.1.2.jar
hive-upgrade-acid-3.1.2.jar
hive-vector-code-gen-3.1.2.jar
hive-site.xml
<configuration>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:postgresql://192.168.5.130:5432/hive?createDatabaseIfNotExist=true</value>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>org.postgresql.Driver</value></property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>hive</value>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>hive</value>
</property>
<property>
<name>hive.metastore.warehouse.dir</name>
<value>hdfs://192.168.5.130:9000/user/hive/warehouse</value>
</property>
<property>
<name>hive.server2.enable.doAs</name>
<value>false</value>
</property>
<property>
<name>datanucleus.autoCreateSchema</name>
<value>false</value>
</property>
<property>
<name>hive.aux.jars.path</name>
<value>file:///usr/local/hive/lib</value>
</property>
</configuration>
After copy hive-site.xml to $SPARK_HOME/conf, not found exception occrred about org/apache/commons/collections/CollectionUtils like below.
spark.sql("show tables").show
scala> spark.sql("show tables").show
21/05/24 00:49:58 ERROR FileUtils: The jar file path file:///usr/local/hive/lib/*.jar doesn't exist
Hive Session ID = a6d63a41-e235-4d8c-a660-6f7b1a22996b
21/05/24 00:49:59 WARN ObjectStore: datanucleus.autoStartMechanismMode is set to unsupported value null . Setting it to value: ignored
21/05/24 00:50:01 WARN MetaData: Metadata has jdbc-type of null yet this is not valid. Ignored
21/05/24 00:50:01 WARN MetaData: Metadata has jdbc-type of null yet this is not valid. Ignored
21/05/24 00:50:01 WARN MetaData: Metadata has jdbc-type of null yet this is not valid. Ignored
21/05/24 00:50:01 WARN MetaData: Metadata has jdbc-type of null yet this is not valid. Ignored
21/05/24 00:50:01 WARN MetaData: Metadata has jdbc-type of null yet this is not valid. Ignored
21/05/24 00:50:01 WARN MetaData: Metadata has jdbc-type of null yet this is not valid. Ignored
21/05/24 00:50:02 WARN MetaData: Metadata has jdbc-type of null yet this is not valid. Ignored
21/05/24 00:50:02 WARN MetaData: Metadata has jdbc-type of null yet this is not valid. Ignored
21/05/24 00:50:02 WARN MetaData: Metadata has jdbc-type of null yet this is not valid. Ignored
21/05/24 00:50:02 WARN MetaData: Metadata has jdbc-type of null yet this is not valid. Ignored
21/05/24 00:50:02 WARN MetaData: Metadata has jdbc-type of null yet this is not valid. Ignored
21/05/24 00:50:02 WARN MetaData: Metadata has jdbc-type of null yet this is not valid. Ignored
21/05/24 00:50:02 WARN Hive: Failed to register all functions.
java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
at org.apache.hadoop.hive.metastore.utils.JavaUtils.newInstance(JavaUtils.java:86)
at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.<init>(RetryingMetaStoreClient.java:95)
at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:148)
at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:119)
at org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:4299)
at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:4367)
at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:4347)
at org.apache.hadoop.hive.ql.metadata.Hive.getAllFunctions(Hive.java:4603)
at org.apache.hadoop.hive.ql.metadata.Hive.reloadFunctions(Hive.java:291)
at org.apache.hadoop.hive.ql.metadata.Hive.registerAllFunctionsOnce(Hive.java:274)
at org.apache.hadoop.hive.ql.metadata.Hive.<init>(Hive.java:435)
at org.apache.hadoop.hive.ql.metadata.Hive.create(Hive.java:375)
at org.apache.hadoop.hive.ql.metadata.Hive.getInternal(Hive.java:355)
at org.apache.hadoop.hive.ql.metadata.Hive.get(Hive.java:331)
at org.apache.spark.sql.hive.client.HiveClientImpl.client(HiveClientImpl.scala:257)
at org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$withHiveState$1(HiveClientImpl.scala:283)
at org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:224)
at org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:223)
at org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:273)
at org.apache.spark.sql.hive.client.HiveClientImpl.databaseExists(HiveClientImpl.scala:384)
at org.apache.spark.sql.hive.HiveExternalCatalog.$anonfun$databaseExists$1(HiveExternalCatalog.scala:224)
at scala.runtime.java8.JFunction0$mcZ$sp.apply(JFunction0$mcZ$sp.java:23)
at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:102)
at org.apache.spark.sql.hive.HiveExternalCatalog.databaseExists(HiveExternalCatalog.scala:224)
at org.apache.spark.sql.internal.SharedState.externalCatalog$lzycompute(SharedState.scala:134)
at org.apache.spark.sql.internal.SharedState.externalCatalog(SharedState.scala:124)
at org.apache.spark.sql.internal.SharedState.globalTempViewManager$lzycompute(SharedState.scala:154)
at org.apache.spark.sql.internal.SharedState.globalTempViewManager(SharedState.scala:152)
at org.apache.spark.sql.hive.HiveSessionStateBuilder.$anonfun$catalog$2(HiveSessionStateBuilder.scala:60)
at org.apache.spark.sql.catalyst.catalog.SessionCatalog.globalTempViewManager$lzycompute(SessionCatalog.scala:99)
at org.apache.spark.sql.catalyst.catalog.SessionCatalog.globalTempViewManager(SessionCatalog.scala:99)
at org.apache.spark.sql.catalyst.catalog.SessionCatalog.listTables(SessionCatalog.scala:946)
at org.apache.spark.sql.catalyst.catalog.SessionCatalog.listTables(SessionCatalog.scala:932)
at org.apache.spark.sql.catalyst.catalog.SessionCatalog.listTables(SessionCatalog.scala:924)
at org.apache.spark.sql.execution.command.ShowTablesCommand.$anonfun$run$43(tables.scala:868)
at scala.Option.getOrElse(Option.scala:189)
at org.apache.spark.sql.execution.command.ShowTablesCommand.run(tables.scala:868)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:79)
at org.apache.spark.sql.Dataset.$anonfun$logicalPlan$1(Dataset.scala:228)
at org.apache.spark.sql.Dataset.$anonfun$withAction$1(Dataset.scala:3687)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:103)
at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:772)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3685)
at org.apache.spark.sql.Dataset.<init>(Dataset.scala:228)
at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:99)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:772)
at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:96)
at org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:615)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:772)
at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:610)
at $line14.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.<init>(<console>:24)
at $line14.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw.<init>(<console>:28)
at $line14.$read$$iw$$iw$$iw$$iw$$iw$$iw.<init>(<console>:30)
at $line14.$read$$iw$$iw$$iw$$iw$$iw.<init>(<console>:32)
at $line14.$read$$iw$$iw$$iw$$iw.<init>(<console>:34)
at $line14.$read$$iw$$iw$$iw.<init>(<console>:36)
at $line14.$read$$iw$$iw.<init>(<console>:38)
at $line14.$read$$iw.<init>(<console>:40)
at $line14.$read.<init>(<console>:42)
at $line14.$read$.<init>(<console>:46)
at $line14.$read$.<clinit>(<console>)
at $line14.$eval$.$print$lzycompute(<console>:7)
at $line14.$eval$.$print(<console>:6)
at $line14.$eval.$print(<console>)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at scala.tools.nsc.interpreter.IMain$ReadEvalPrint.call(IMain.scala:745)
at scala.tools.nsc.interpreter.IMain$Request.loadAndRun(IMain.scala:1021)
at scala.tools.nsc.interpreter.IMain.$anonfun$interpret$1(IMain.scala:574)
at scala.reflect.internal.util.ScalaClassLoader.asContext(ScalaClassLoader.scala:41)
at scala.reflect.internal.util.ScalaClassLoader.asContext$(ScalaClassLoader.scala:37)
at scala.reflect.internal.util.AbstractFileClassLoader.asContext(AbstractFileClassLoader.scala:41)
at scala.tools.nsc.interpreter.IMain.loadAndRunReq$1(IMain.scala:573)
at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:600)
at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:570)
at scala.tools.nsc.interpreter.ILoop.interpretStartingWith(ILoop.scala:894)
at scala.tools.nsc.interpreter.ILoop.command(ILoop.scala:762)
at scala.tools.nsc.interpreter.ILoop.processLine(ILoop.scala:464)
at scala.tools.nsc.interpreter.ILoop.loop(ILoop.scala:485)
at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:239)
at org.apache.spark.repl.Main$.doMain(Main.scala:78)
at org.apache.spark.repl.Main$.main(Main.scala:58)
at org.apache.spark.repl.Main.main(Main.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:951)
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1030)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1039)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.hadoop.hive.metastore.utils.JavaUtils.newInstance(JavaUtils.java:84)
... 101 more
Caused by: MetaException(message:org/apache/commons/collections/CollectionUtils)
at org.apache.hadoop.hive.metastore.RetryingHMSHandler.<init>(RetryingHMSHandler.java:84)
at org.apache.hadoop.hive.metastore.RetryingHMSHandler.getProxy(RetryingHMSHandler.java:93)
at org.apache.hadoop.hive.metastore.HiveMetaStore.newRetryingHMSHandler(HiveMetaStore.java:8667)
at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:169)
at org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.<init>(SessionHiveMetaStoreClient.java:94)
... 106 more
Caused by: java.lang.NoClassDefFoundError: org/apache/commons/collections/CollectionUtils
at org.apache.hadoop.hive.metastore.ObjectStore.grantPrivileges(ObjectStore.java:5709)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:97)
at com.sun.proxy.$Proxy39.grantPrivileges(Unknown Source)
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultRoles_core(HiveMetaStore.java:828)
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultRoles(HiveMetaStore.java:794)
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:539)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:147)
at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:108)
at org.apache.hadoop.hive.metastore.RetryingHMSHandler.<init>(RetryingHMSHandler.java:80)
... 110 more
Caused by: java.lang.ClassNotFoundException: org.apache.commons.collections.CollectionUtils
at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
at org.apache.spark.sql.hive.client.IsolatedClientLoader$$anon$1.doLoadClass(IsolatedClientLoader.scala:247)
at org.apache.spark.sql.hive.client.IsolatedClientLoader$$anon$1.loadClass(IsolatedClientLoader.scala:236)
at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
... 127 more
repeated logs ...
this error can be fixed to copying $SPARK_HOME/jars/commons-collections-3.2.2.jar to $HIVE_HOME/lib.