1
votes

I have a problem with Hive on spark, when I run the simple query like

select * from table_name

on hive console every thing works well, but when I executing

select count(*) from table_name 

the query terminates with the following error:

Query ID = ab_20160515134700_795fc14c-e89b-4172-bcc6-0cfcffadcd88
Total jobs = 1
Launching Job 1 out of 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Starting Spark Job = d5e1856e-de67-4e2d-a914-ca1aae324b7f
Status: SENT
Failed to execute spark task, with exception 'java.lang.IllegalStateException(RPC channel is closed.)'
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.spark.SparkTask

versions:

hadoop-2.7.2
apache-hive-2.0.0
spark-1.6.0-bin-hadoop2
scala: 2.11.8

I have set: spark.master in hive-site.xml And now I get: java.util.concurrent.ExecutionException: java.lang.RuntimeException: Cancel client '8ffe7ea3-aaf4-456c-ae18-23c572a766c5'. Error: Child process exited before connecting back at io.netty.util.concurrent.AbstractFuture.get(AbstractFuture.java:37) ~[netty-all-4.0.23.Final.jar:4.0.23.Final] at org.apache.hive.spark.client.SparkClientImpl.(SparkClientImpl.java:101) [hive-exec-2.0.0.jar:2.0.0] at org.apache.hive.spark.client.SparkClientFactory.createClient(SparkClientFactory.java:80) [hive-exec-2.0.0.jar:2.0.0] at org.apache.hadoop.hive.ql.exec.spark.RemoteHiveSparkClient.createRemoteClient(RemoteHiveSparkClient.java:98) [hive-exec-2.0.0.jar:2.0.0] at org.apache.hadoop.hive.ql.exec.spark.RemoteHiveSparkClient.(RemoteHiveSparkClient.java:94) [hive-exec-2.0.0.jar:2.0.0] at org.apache.hadoop.hive.ql.exec.spark.HiveSparkClientFactory.createHiveSparkClient(HiveSparkClientFactory.java:63) [hive-exec-2.0.0.jar:2.0.0] at org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionImpl.open(SparkSessionImpl.java:55) [hive-exec-2.0.0.jar:2.0.0] at org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionManagerImpl.getSession(SparkSessionManagerImpl.java:114) [hive-exec-2.0.0.jar:2.0.0] at org.apache.hadoop.hive.ql.exec.spark.SparkUtilities.getSparkSession(SparkUtilities.java:131) [hive-exec-2.0.0.jar:2.0.0] at org.apache.hadoop.hive.ql.exec.spark.SparkTask.execute(SparkTask.java:106) [hive-exec-2.0.0.jar:2.0.0] at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:158) [hive-exec-2.0.0.jar:2.0.0] at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:101) [hive-exec-2.0.0.jar:2.0.0] at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1840) [hive-exec-2.0.0.jar:2.0.0] at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1584) [hive-exec-2.0.0.jar:2.0.0] at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1361) [hive-exec-2.0.0.jar:2.0.0] at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1184) [hive-exec-2.0.0.jar:2.0.0] at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1172) [hive-exec-2.0.0.jar:2.0.0] at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:233) [hive-cli-2.0.0.jar:2.0.0] at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:184) [hive-cli-2.0.0.jar:2.0.0] at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:400) [hive-cli-2.0.0.jar:2.0.0] at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:778) [hive-cli-2.0.0.jar:2.0.0] at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:717) [hive-cli-2.0.0.jar:2.0.0] at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:645) [hive-cli-2.0.0.jar:2.0.0] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_77] at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_77] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_77] at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_77] at org.apache.hadoop.util.RunJar.run(RunJar.java:221) [spark-assembly-1.6.0-hadoop2.6.0.jar:1.6.0] at org.apache.hadoop.util.RunJar.main(RunJar.java:136) [spark-assembly-1.6.0-hadoop2.6.0.jar:1.6.0] Caused by: java.lang.RuntimeException: Cancel client '8ffe7ea3-aaf4-456c-ae18-23c572a766c5'. Error: Child process exited before connecting back at org.apache.hive.spark.client.rpc.RpcServer.cancelClient(RpcServer.java:180) ~[hive-exec-2.0.0.jar:2.0.0] at org.apache.hive.spark.client.SparkClientImpl$3.run(SparkClientImpl.java:450) ~[hive-exec-2.0.0.jar:2.0.0] at java.lang.Thread.run(Thread.java:745) ~[?:1.8.0_77] 16/05/16 18:00:33 [Driver]: WARN client.SparkClientImpl: Child process exited with code 1


I have built Spark 1.6.1 and hive 2.0.0 so the error has been changed to:

Exception in thread "main" java.lang.NoClassDefFoundError: scala/collection/Iterable
    at org.apache.hadoop.hive.ql.parse.spark.GenSparkProcContext.<init>(GenSparkProcContext.java:163)
    at org.apache.hadoop.hive.ql.parse.spark.SparkCompiler.generateTaskTree(SparkCompiler.java:195)
    at org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:258)
    at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10861)
    at org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:239)
    at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:250)
    at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:437)
    at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:329)
    at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1158)
    at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1253)
    at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1084)
    at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1072)
    at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:232)
    at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:183)
    at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:399)
    at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:776)
    at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:714)
    at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:641)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: java.lang.ClassNotFoundException: scala.collection.Iterable
    at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
1
Can you please paste full stack trace?Ram Ghadiyaram
That means your jar files are not consistent, i.e, not compiled with the same code. please go through the link similar kind of problem was discussed here issues.apache.org/jira/browse/HIVE-9970Ram Ghadiyaram

1 Answers

0
votes

I went into the same issue as you on Hive 2.0.0 and Spark 1.6.1. As said before, it had been discussed at issues.apache.org/jira/browse/HIVE-9970.

Having that said, for Hive:

  1. Download Hive source package
  2. Set the right Hadoop/Spark/Tez versions at pom.xml
  3. Expand memory limits for maven. I use export MAVEN_OPTS="-Xmx2g -XX:MaxPermSize=512M -XX:ReservedCodeCacheSize=512m"
  4. Build Hive usign Maven: mvn clean package -Pdist -DskipTests
  5. Result at: packaging/target/apache-hive-2.x.y-bin. Configure hive-site.xml.

For Spark:

  1. Download Spark source package
  2. Set the right Hadoop version at pom.xml
  3. Build Spark without Hive using ./make-distribution.sh --name "hadoop2-without-hive" --tgz "-Pyarn,hadoop-provided,hadoop-2.6,parquet-provided"
  4. Result at dist/. Configure spark-defaults.conf.

Since you've built Spark without Hadoop, you'll need to include Hadoop package jars path to $SPARK_DIST_CLASSPATH. See this documentation page. Additionally, you can read the Hive on Spark guide as a reference.