4
votes

I am running spark on top of yarn on ubuntu 20.4 cluster versions :

  • Hadoop 3.2.2
  • Hive 3.1.2
  • Spark 3.1.1

i have given the symlink from spark's jar to hive's lib as :

sudo ln -s $SPARK_HOME/jars/spark-network-common_2.12-3.1.1.jar $HIVE_HOME/lib/spark-network-common_2.12-3.1.1.jar
sudo ln -s $SPARK_HOME/jars/spark-core_2.12-3.1.1.jar $HIVE_HOME/lib/spark-core_2.12-3.1.1.jar
sudo ln -s $SPARK_HOME/jars/scala-library-2.12.10.jar $HIVE_HOME/lib/scala-library-2.12.10.jar

and when running hive and setting spark as it's enging i get the following error :

Failed to execute spark task, with exception 'org.apache.hadoop.hive.ql.metadata.HiveException(Failed to create Spark client for Spark session 57f08f6b-02b7-4c3d-bf8c-4ec351a5fd34)'
2021-05-31T12:31:58,949 ERROR [a69d446a-f1a0-45d9-8dbc-c0fccbf718b3 main] spark.SparkTask: Failed to execute spark task, with exception 'org.apache.hadoop.hive.ql.metadata.HiveException(Failed to create Spark client for Spark session 57f08f6b-02b7-4c3d-bf8c-4ec351a5fd34)'
org.apache.hadoop.hive.ql.metadata.HiveException: Failed to create Spark client for Spark session 57f08f6b-02b7-4c3d-bf8c-4ec351a5fd34
        at org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionImpl.getHiveException(SparkSessionImpl.java:221)
        at org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionImpl.open(SparkSessionImpl.java:92)
        at org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionManagerImpl.getSession(SparkSessionManagerImpl.java:115)
        at org.apache.hadoop.hive.ql.exec.spark.SparkUtilities.getSparkSession(SparkUtilities.java:136)
        at org.apache.hadoop.hive.ql.exec.spark.SparkTask.execute(SparkTask.java:115)
        at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:205)
        at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:97)
        at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2664)
        at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:2335)
        at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:2011)
        at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1709)
        at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1703)
        at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:157)
        at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:218)
        at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:239)
        at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:188)
        at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:402)
        at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:821)
        at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:759)
        at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:683)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.hadoop.util.RunJar.run(RunJar.java:323)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:236)
Caused by: java.lang.NoClassDefFoundError: org/apache/spark/unsafe/array/ByteArrayMethods
        at org.apache.spark.internal.config.package$.<init>(package.scala:1095)
        at org.apache.spark.internal.config.package$.<clinit>(package.scala)
        at org.apache.spark.SparkConf$.<init>(SparkConf.scala:654)
        at org.apache.spark.SparkConf$.<clinit>(SparkConf.scala)
        at org.apache.spark.SparkConf.set(SparkConf.scala:94)
        at org.apache.spark.SparkConf.set(SparkConf.scala:83)
        at org.apache.hadoop.hive.ql.exec.spark.HiveSparkClientFactory.generateSparkConf(HiveSparkClientFactory.java:265)
        at org.apache.hadoop.hive.ql.exec.spark.RemoteHiveSparkClient.<init>(RemoteHiveSparkClient.java:98)
        at org.apache.hadoop.hive.ql.exec.spark.HiveSparkClientFactory.createHiveSparkClient(HiveSparkClientFactory.java:76)
        at org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionImpl.open(SparkSessionImpl.java:87)
        ... 24 more
Caused by: java.lang.ClassNotFoundException: org.apache.spark.unsafe.array.ByteArrayMethods
        at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
        ... 34 more

2021-05-31T12:31:58,950 ERROR [a69d446a-f1a0-45d9-8dbc-c0fccbf718b3 main] spark.SparkTask: Failed to execute spark task, with exception 'org.apache.hadoop.hive.ql.metadata.HiveException(Failed to create Spark client for Spark session 57f08f6b-02b7-4c3d-bf8c-4ec351a5fd34)'
org.apache.hadoop.hive.ql.metadata.HiveException: Failed to create Spark client for Spark session 57f08f6b-02b7-4c3d-bf8c-4ec351a5fd34
        at org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionImpl.getHiveException(SparkSessionImpl.java:221) ~[hive-exec-3.1.2.jar:3.1.2]
        at org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionImpl.open(SparkSessionImpl.java:92) ~[hive-exec-3.1.2.jar:3.1.2]
        at org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionManagerImpl.getSession(SparkSessionManagerImpl.java:115) ~[hive-exec-3.1.2.jar:3.1.2]
        at org.apache.hadoop.hive.ql.exec.spark.SparkUtilities.getSparkSession(SparkUtilities.java:136) ~[hive-exec-3.1.2.jar:3.1.2]
        at org.apache.hadoop.hive.ql.exec.spark.SparkTask.execute(SparkTask.java:115) ~[hive-exec-3.1.2.jar:3.1.2]
        at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:205) ~[hive-exec-3.1.2.jar:3.1.2]
        at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:97) ~[hive-exec-3.1.2.jar:3.1.2]
        at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2664) ~[hive-exec-3.1.2.jar:3.1.2]
        at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:2335) ~[hive-exec-3.1.2.jar:3.1.2]
        at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:2011) ~[hive-exec-3.1.2.jar:3.1.2]
        at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1709) ~[hive-exec-3.1.2.jar:3.1.2]
        at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1703) ~[hive-exec-3.1.2.jar:3.1.2]
        at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:157) ~[hive-exec-3.1.2.jar:3.1.2]
        at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:218) ~[hive-exec-3.1.2.jar:3.1.2]
        at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:239) ~[hive-cli-3.1.2.jar:3.1.2]
        at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:188) ~[hive-cli-3.1.2.jar:3.1.2]
        at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:402) ~[hive-cli-3.1.2.jar:3.1.2]
        at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:821) ~[hive-cli-3.1.2.jar:3.1.2]
        at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:759) ~[hive-cli-3.1.2.jar:3.1.2]
        at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:683) ~[hive-cli-3.1.2.jar:3.1.2]
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_292]
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_292]
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_292]
        at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_292]
        at org.apache.hadoop.util.RunJar.run(RunJar.java:323) ~[hadoop-common-3.2.2.jar:?]
        at org.apache.hadoop.util.RunJar.main(RunJar.java:236) ~[hadoop-common-3.2.2.jar:?]
Caused by: java.lang.NoClassDefFoundError: org/apache/spark/unsafe/array/ByteArrayMethods
        at org.apache.spark.internal.config.package$.<init>(package.scala:1095) ~[spark-core_2.12-3.1.1.jar:3.1.1]
        at org.apache.spark.internal.config.package$.<clinit>(package.scala) ~[spark-core_2.12-3.1.1.jar:3.1.1]
        at org.apache.spark.SparkConf$.<init>(SparkConf.scala:654) ~[spark-core_2.12-3.1.1.jar:3.1.1]
        at org.apache.spark.SparkConf$.<clinit>(SparkConf.scala) ~[spark-core_2.12-3.1.1.jar:3.1.1]
        at org.apache.spark.SparkConf.set(SparkConf.scala:94) ~[spark-core_2.12-3.1.1.jar:3.1.1]
        at org.apache.spark.SparkConf.set(SparkConf.scala:83) ~[spark-core_2.12-3.1.1.jar:3.1.1]
        at org.apache.hadoop.hive.ql.exec.spark.HiveSparkClientFactory.generateSparkConf(HiveSparkClientFactory.java:265) ~[hive-exec-3.1.2.jar:3.1.2]
        at org.apache.hadoop.hive.ql.exec.spark.RemoteHiveSparkClient.<init>(RemoteHiveSparkClient.java:98) ~[hive-exec-3.1.2.jar:3.1.2]
        at org.apache.hadoop.hive.ql.exec.spark.HiveSparkClientFactory.createHiveSparkClient(HiveSparkClientFactory.java:76) ~[hive-exec-3.1.2.jar:3.1.2]
        at org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionImpl.open(SparkSessionImpl.java:87) ~[hive-exec-3.1.2.jar:3.1.2]
        ... 24 more

i have downloaded spark as a prebuilt for hadoop 3.2.0 and later in which spark jars is containing hive 2.3.0 jars and hive is of 3.1.2 in which hive's lib contains 3.1.2 jars

1
I have exactly the same versions installed and the same problem. My hive query works well when the engine is set as "mr". But when the engine is changed to "spark", the same error happens. If anyone knows the answer, please share. Or, if anyone is sure which version of Spark supports Hadoop 3.2.2 and Hive 3.1.2, please also comment. I can easily re-installed Spark, but it will be difficult to re-install the Hadoop/Hive (as the data is already stored in HDFS).Yu Xiang
Did you try adding the spark-unsafe.jar from mvnrepository.com/artifact/org.apache.spark/spark-unsafe to your jar files?Saša Zejnilović
No i have not added the spark unsafe jar i will try that but would like to know firsr is spark 3 even supported as hive's engine as i found this story which says hive does not support spark 3 version yet issues.apache.org/jira/plugins/servlet/mobile#issue/HIVE-25099Mohamed Ali

1 Answers

1
votes

Hive on Spark is only tested with a specific version of Spark, so a given version of Hive is only guaranteed to work with a specific version of Spark. Other versions of Spark may work with a given version of Hive, but that is not guaranteed. Below is a list of Hive versions and their corresponding compatible Spark versions.

Please find the information in Hive Spark Compatibility Chart. enter image description here

Please use spark 2.3.0 and take a look at POM.xml which includes the below on itself.

<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-unsafe_2.11</artifactId>
<version>2.3.0</version>
<scope>compile</scope>
</dependency>